Hi, I am trying to create a new table or data set differently based on logic that looks for specific records in a data set. For example, I want to create a new data set with a WTD, MTD, YTD, and LTD column from the original data set if all of those exist. If they don't (in my case there is no WTD in the original data set) then I want to insert a dummy WTD record and union it to the MTD, YTD, and LTD records from the original data set for the new data set. The way I have written the code, I get the following error:
ERROR: A Composite expression (usually a subquery) is used incorrectly in an expression.
Here is the code I have written. Is there a way to acomplish what I am tring to do?
proc sql;
create table FLOWFRED.FMFLOW_LOANSFUNDED_20YR_RPT_APP as
SELECT DISTINCT
CASE
WHEN EXISTS
(
select *
from FLOWFRED.FMFLOW_LOANSFUNDED_20YR_RPT a3
where TIMEFRAME = 'WTD'
)
THEN
(
select
b.TIMEFRAME,
b._TYPE_ as _TYPE_,
b._PAGE_ as _PAGE_,
b._TABLE_ as _TABLE_,
b.INVESTOR_LOAN_ID_N,
b.NOTE_BALANCE_Sum,
b.NOTE_BALANCE_Mean,
b.WGT_AVG_20YR_Mean,
b.LTV_Mean,
b.FICO_Mean,
b.SRP_RATE_Mean,
b.SRP_AMT_Mean,
b.SRP_AMT_Sum,
b.PCT_TOTAL_UPB_PURCHASE_Mean,
b.PCT_TOTAL_UPB_REFI_Mean,
b.PCT_TOTAL_CNT_ESCROW_Mean,
b.PCT_TOTAL_CNT_NONESCROW_Mean,
CASE
WHEN b.TIMEFRAME = 'WTD' then 1
WHEN b.TIMEFRAME = 'MTD' then 2
WHEN b.TIMEFRAME = 'YTD' then 3
WHEN b.TIMEFRAME = 'LTD' then 4
END as TIMEFRAME_NO
from FLOWFRED.FMFLOW_LOANSFUNDED_20YR_RPT b
/*order by TIMEFRAME_NO*/
)
else
(
SELECT
'WTD' as TIMEFRAME,
'1' as _TYPE_,
1 as _PAGE_,
1 as _TABLE_,
0 as INVESTOR_LOAN_ID_N,
0 as NOTE_BALANCE_Sum,
0 as NOTE_BALANCE_Mean,
0 as WGT_AVG_20YR_Mean,
0 as LTV_Mean,
0 as FICO_Mean,
0 as SRP_RATE_Mean,
0 as SRP_AMT_Mean,
0 as SRP_AMT_Sum,
0 as PCT_TOTAL_UPB_PURCHASE_Mean,
0 as PCT_TOTAL_UPB_REFI_Mean,
0 as PCT_TOTAL_CNT_ESCROW_Mean,
0 as PCT_TOTAL_CNT_NONESCROW_Mean,
1 as TIMEFRAME_NO
from FLOWFRED.FMFLOW_LOANSFUNDED_20YR_RPT a
union
select
b.TIMEFRAME,
b._TYPE_ as _TYPE_,
b._PAGE_ as _PAGE_,
b._TABLE_ as _TABLE_,
b.INVESTOR_LOAN_ID_N,
b.NOTE_BALANCE_Sum,
b.NOTE_BALANCE_Mean,
b.WGT_AVG_20YR_Mean,
b.LTV_Mean,
b.FICO_Mean,
b.SRP_RATE_Mean,
b.SRP_AMT_Mean,
b.SRP_AMT_Sum,
b.PCT_TOTAL_UPB_PURCHASE_Mean,
b.PCT_TOTAL_UPB_REFI_Mean,
b.PCT_TOTAL_CNT_ESCROW_Mean,
b.PCT_TOTAL_CNT_NONESCROW_Mean,
CASE
WHEN b.TIMEFRAME = 'WTD' then 1
WHEN b.TIMEFRAME = 'MTD' then 2
WHEN b.TIMEFRAME = 'YTD' then 3
WHEN b.TIMEFRAME = 'LTD' then 4
END as TIMEFRAME_NO
from FLOWFRED.FMFLOW_LOANSFUNDED_20YR_RPT b
/*order by TIMEFRAME_NO*/
) END as x
FROM FLOWFRED.FMFLOW_LOANSFUNDED_20YR_RPT a;
quit;
Thanks @elwayfan446 for the clarification. Then I would agree with those who have suggested to use a "master list".
/* Create a dataset containing all time frame codes */
data timeframes;
length timeframe $3; /* Please adapt $3 to the length of TIMEFRAME in HAVE! */
input timeframe;
cards;
LTD
MTD
WTD
YTD
;
/* Prepare the HAVE dataset, if it is not sorted by TIMEFRAME */
proc sort data=have;
by timeframe;
run;
/* Insert dummy observations as desired */
data want;
merge have
timeframes;
by timeframe;
run;
Hi,
Sorry, thats a bit TL;DR for me. I can tell at a brief glance that that wont work or wont be the best way of doing it. I would recommend posting a small datastep with some test data in, just a few lines, and what you want the output to look like. At a guess on the logic, first create a distinct table of the data you want out, then just left join onto that the data from your main table where the condition is true.
proc sql; create table WANT as select distinct ... from HAVE A left join (select ... where condition is true) B on ...; quit;
Its vague as I have no test data.
Thanks RW9. I was not sure if a left join would work since this would be acting like a merge or union. Maybe if I lay it out like this:
Original Table has the following observations:
MTD field 2, field 3, etc.
YTD field 2, field 3, etc.
LTD field 2, field 3, etc.
In that table, there is no WTD. In the new table I need a WTD so I would create a "dummy" record with zeros for the other fields. I need to check to see if that WTD exists initially to determine if I need a dummy record or not. If it exists, then I simply need to take all data from the original table.
Final table would look like
WTD, field 2, field 3, etc. (whether this is from dummy data or the data in the original table)
MTD field 2, field 3, etc.
YTD field 2, field 3, etc.
LTD field 2, field 3, etc.
Missing level problem. Create a table that has all the levels you need. Then you merge them together and if a value doesn't exist the join will add it in, but if all records are present it will work as intended.
Reeza, can you point me to an example of this? I am new to this one.
@RW9 solution is the method I was suggesting.
Basically, you require a master list. Then merge your data into the master table, and any missing records are added in via the join.
Data is key here to understanding, post some test data. Here is a an example (verbose coding but I am just leaving for the day). Create a template dataset, then merge back to it:
data have; id=1; a="MTD"; field2=1; output; id=1; a="YTD"; field2=1; output; id=1; a="LTD"; field2=1; output; id=2; a="MTD"; field2=1; output; id=2; a="YTD"; field2=1; output; id=2; a="LTD"; field2=1; output; id=2; a="WTD"; field2=1; output; run; proc sort data=have out=tmp nodupkey; by id; run; data tmp; set tmp (keep=id); do a="MTD","YTD","LTD","WTD"; output; end; run; proc sql; create table WANT as select A.*, B.FIELD2 from TMP A left join HAVE B on A.ID=B.ID and A.A=B.A; quit;
Thanks, I will give this a try too.
I have taken this code and modified it for my data.
data FLOWFRED.FMFLOW_LOANSFUNDED_20YR_RPT;
id=1; a="MTD"; _TYPE_=1; output;
id=1; a="YTD"; _TYPE_=1; output;
id=1; a="LTD"; _TYPE_=1; output;
id=2; a="WTD"; _TYPE_=1; output;
id=2; a="MTD"; _TYPE_=1; output;
id=2; a="YTD"; _TYPE_=1; output;
id=2; a="LTD"; _TYPE_=1; output;
run ;
proc sort data=flowfred.fmflow_loansfunded_20yr_rpt out=tmp nodupkey;
by id;
run;
data tmp;
set tmp (keep=id _TYPE_);
do a="WTD","MTD","YTD","LTD";
output;
end;
run;
proc sql;
create table FLOWFRED.FMFLOW_LOANSFUNDED_20YR_RPT_APP as
select A.*,
b._TYPE_
from tmp A
left join flowfred.fmflow_loansfunded_20yr_rpt b on a.id=b.id
and a.a=b.a;
quit;
Here are the results I am getting from this. Note that currently the original data set currently has an observation that includes WTD in the variable TIMEFRAME. How should I interpret the results I am getting? Is this what would be expected?
id _TYPE_ a
1 1 WTD
1 1 MTD
1 1 YTD
1 1 LTD
2 1 WTD
2 1 MTD
2 1 YTD
2 1 LTD
Hi,
Let me try again. Post some example test data - in the form of a datastep, and what you want the output to look like. Each of the replies given are of the same form - create the elements you need in the final dataset, then merge this back to original data. The merge takes care of missing records. However we can't provide exact scenarios without seeing what you have and want.
This sounds like a good example of when to use macro language. To prepare, get two working versions of your program. One should work when WTD does exist, and the other should work when WTD doesn't exist.
When those are ready, it is likely a straightforward macro language application. Determine whether WTD exists, and have the macro implement one version of the program or the other.
Once we're that far, we can talk about the best implementation that allows for sets of variables to exist or not. (You won't need dozens of versions of a working program. Macro language will take care of that.)
It looks like you try to use a CASE structure to create a new table conditionally. However, to my knowledge, you can only use it to create a new variable. Also, in the beginning you talk about a WTD column, but later about a WTD record (i.e row) and in your code 'WTD' occurs only as a value of a character variable TIMEFRAME.
In your second post you show a dataset with only three observations. Is this true also for your real data or do you have more records there, maybe with a key variable (not shown so far; see RW9's variable ID) and blocks of 3 or 4 records with TIMEFRAME values 'MTD', 'YTD', 'LTD' and possibly 'WTD'? If this is the case, I think you could achieve what you want even without merging, PROC SQL, auxiliary datasets or macros, in a single data step.
Sorry for the confusion about this. Yes, WTD, MTD, YTD, and LTD would be values of the TIMEFRAME variable in the data set. The data set has several variables other than TIMEFRAME for each observation. My goal is to create a dummy observation for WTD if it doesn't exist in the original data set, otherwise, just pull the observations for WTD, MTD, YTD, and LTD that will be in the original data set.
I hope that is a better explanation. Again, sorry for the confusion.
So, assuming your existing dataset (call it HAVE) has only 3 or 4 observations (depending on whether an observation with TIMEFRAME='WTD' exists), you could create the new dataset (call it WANT) as shown below:
/* Scenario 1: WTD record exists */
data have;
input timeframe $ field2 field3;
cards;
MTD 1 11
YTD 2 22
LTD 3 33
WTD 4 44
;
data want;
set have end=last;
wtdex+(timeframe='WTD');
output;
if last & not wtdex then do;
call missing(of _all_);
timeframe='WTD';
output;
end;
drop wtdex;
run;
/* Scenario 2: WTD record does not exist */
data have;
set have(obs=3);
run;
/* data want; ... same solution as above */
Does that solution extend to past WTD? What if LTD is missing instead?
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.