Solved: Re: Creating tables with proc sql based on subqueries?

elwayfan446 · Posted 01-13-2016 10:10 AM

Hi, I am trying to create a new table or data set differently based on logic that looks for specific records in a data set. For example, I want to create a new data set with a WTD, MTD, YTD, and LTD column from the original data set if all of those exist. If they don't (in my case there is no WTD in the original data set) then I want to insert a dummy WTD record and union it to the MTD, YTD, and LTD records from the original data set for the new data set. The way I have written the code, I get the following error:

ERROR: A Composite expression (usually a subquery) is used incorrectly in an expression.

Here is the code I have written. Is there a way to acomplish what I am tring to do?

proc sql;
	create table FLOWFRED.FMFLOW_LOANSFUNDED_20YR_RPT_APP as
		SELECT DISTINCT
			CASE 
				WHEN EXISTS
					(
					  select * 
					  from FLOWFRED.FMFLOW_LOANSFUNDED_20YR_RPT a3
					  where TIMEFRAME = 'WTD'
					)
				THEN 
					(
						select 
					   		b.TIMEFRAME,
							b._TYPE_ as _TYPE_,
							b._PAGE_ as _PAGE_,
							b._TABLE_ as _TABLE_,
							b.INVESTOR_LOAN_ID_N,
							b.NOTE_BALANCE_Sum,
							b.NOTE_BALANCE_Mean,
							b.WGT_AVG_20YR_Mean,
							b.LTV_Mean,
							b.FICO_Mean,
							b.SRP_RATE_Mean,
							b.SRP_AMT_Mean,
							b.SRP_AMT_Sum,
							b.PCT_TOTAL_UPB_PURCHASE_Mean,
							b.PCT_TOTAL_UPB_REFI_Mean,
							b.PCT_TOTAL_CNT_ESCROW_Mean,
							b.PCT_TOTAL_CNT_NONESCROW_Mean,
							CASE
								WHEN b.TIMEFRAME = 'WTD' then 1
								WHEN b.TIMEFRAME = 'MTD' then 2
								WHEN b.TIMEFRAME = 'YTD' then 3
								WHEN b.TIMEFRAME = 'LTD' then 4
							END as TIMEFRAME_NO
						from FLOWFRED.FMFLOW_LOANSFUNDED_20YR_RPT b
						/*order by TIMEFRAME_NO*/
					)
				else 
					(
						SELECT 
							'WTD' as TIMEFRAME,
							'1' as _TYPE_,
							1 as _PAGE_,
							1 as _TABLE_,
							0 as INVESTOR_LOAN_ID_N,
							0 as NOTE_BALANCE_Sum,
							0 as NOTE_BALANCE_Mean,
							0 as WGT_AVG_20YR_Mean,
							0 as LTV_Mean,
							0 as FICO_Mean,
							0 as SRP_RATE_Mean,
							0 as SRP_AMT_Mean,
							0 as SRP_AMT_Sum,
							0 as PCT_TOTAL_UPB_PURCHASE_Mean,
							0 as PCT_TOTAL_UPB_REFI_Mean,
							0 as PCT_TOTAL_CNT_ESCROW_Mean,
							0 as PCT_TOTAL_CNT_NONESCROW_Mean,
							1 as TIMEFRAME_NO
					    from FLOWFRED.FMFLOW_LOANSFUNDED_20YR_RPT a	
					    union 
					    select 
					   		b.TIMEFRAME,
							b._TYPE_ as _TYPE_,
							b._PAGE_ as _PAGE_,
							b._TABLE_ as _TABLE_,
							b.INVESTOR_LOAN_ID_N,
							b.NOTE_BALANCE_Sum,
							b.NOTE_BALANCE_Mean,
							b.WGT_AVG_20YR_Mean,
							b.LTV_Mean,
							b.FICO_Mean,
							b.SRP_RATE_Mean,
							b.SRP_AMT_Mean,
							b.SRP_AMT_Sum,
							b.PCT_TOTAL_UPB_PURCHASE_Mean,
							b.PCT_TOTAL_UPB_REFI_Mean,
							b.PCT_TOTAL_CNT_ESCROW_Mean,
							b.PCT_TOTAL_CNT_NONESCROW_Mean,
							CASE
								WHEN b.TIMEFRAME = 'WTD' then 1
								WHEN b.TIMEFRAME = 'MTD' then 2
								WHEN b.TIMEFRAME = 'YTD' then 3
								WHEN b.TIMEFRAME = 'LTD' then 4
							END as TIMEFRAME_NO
						from FLOWFRED.FMFLOW_LOANSFUNDED_20YR_RPT b
						/*order by TIMEFRAME_NO*/
					) END as x
		FROM FLOWFRED.FMFLOW_LOANSFUNDED_20YR_RPT a;
quit;

FreelanceReinh · Posted 01-13-2016 12:26 PM

Thanks @elwayfan446 for the clarification. Then I would agree with those who have suggested to use a "master list".

/* Create a dataset containing all time frame codes */

data timeframes;
length timeframe $3; /* Please adapt $3 to the length of TIMEFRAME in HAVE! */
input timeframe;
cards;
LTD
MTD
WTD
YTD
;

/* Prepare the HAVE dataset, if it is not sorted by TIMEFRAME */

proc sort data=have;
by timeframe;
run;

/* Insert dummy observations as desired */

data want;
merge have
      timeframes;
by timeframe;
run;

View solution in original post

RW9 · Posted 01-13-2016 10:23 AM

Hi,

Sorry, thats a bit TL;DR for me. I can tell at a brief glance that that wont work or wont be the best way of doing it. I would recommend posting a small datastep with some test data in, just a few lines, and what you want the output to look like. At a guess on the logic, first create a distinct table of the data you want out, then just left join onto that the data from your main table where the condition is true.

proc sql;
  create table WANT as
  select  distinct ...
  from    HAVE A
  left join (select ... where condition is true) B
  on ...;
quit;

Its vague as I have no test data.

elwayfan446 · Posted 01-13-2016 10:32 AM

Thanks RW9. I was not sure if a left join would work since this would be acting like a merge or union. Maybe if I lay it out like this:

Original Table has the following observations:

MTD field 2, field 3, etc.

YTD field 2, field 3, etc.

LTD field 2, field 3, etc.

In that table, there is no WTD. In the new table I need a WTD so I would create a "dummy" record with zeros for the other fields. I need to check to see if that WTD exists initially to determine if I need a dummy record or not. If it exists, then I simply need to take all data from the original table.

Final table would look like

WTD, field 2, field 3, etc. (whether this is from dummy data or the data in the original table)

MTD field 2, field 3, etc.

YTD field 2, field 3, etc.

LTD field 2, field 3, etc.

Reeza · Posted 01-13-2016 10:41 AM

Missing level problem. Create a table that has all the levels you need. Then you merge them together and if a value doesn't exist the join will add it in, but if all records are present it will work as intended.

elwayfan446 · Posted 01-13-2016 10:46 AM

Reeza, can you point me to an example of this? I am new to this one.

Reeza · Posted 01-13-2016 10:51 AM

@RW9 solution is the method I was suggesting.

Basically, you require a master list. Then merge your data into the master table, and any missing records are added in via the join.

RW9 · Posted 01-13-2016 10:43 AM

Data is key here to understanding, post some test data. Here is a an example (verbose coding but I am just leaving for the day). Create a template dataset, then merge back to it:

data have;
  id=1; a="MTD"; field2=1; output;
  id=1; a="YTD"; field2=1; output;
  id=1; a="LTD"; field2=1; output;
  id=2; a="MTD"; field2=1; output;
  id=2; a="YTD"; field2=1; output;
  id=2; a="LTD"; field2=1; output;
  id=2; a="WTD"; field2=1; output;
run;

proc sort data=have out=tmp nodupkey;
  by id;
run;
data tmp;
  set tmp (keep=id);
  do a="MTD","YTD","LTD","WTD";
    output;
  end;
run;

proc sql;
  create table WANT as
  select  A.*,
          B.FIELD2
  from    TMP A
  left join HAVE B
  on      A.ID=B.ID
  and     A.A=B.A;
quit;

elwayfan446 · Posted 01-13-2016 10:47 AM

Thanks, I will give this a try too.

elwayfan446 · Posted 01-13-2016 11:15 AM

I have taken this code and modified it for my data.

data FLOWFRED.FMFLOW_LOANSFUNDED_20YR_RPT;
	id=1; a="MTD"; _TYPE_=1; output;
	id=1; a="YTD"; _TYPE_=1; output;
	id=1; a="LTD"; _TYPE_=1; output;
	id=2; a="WTD"; _TYPE_=1; output;
	id=2; a="MTD"; _TYPE_=1; output;
	id=2; a="YTD"; _TYPE_=1; output;
	id=2; a="LTD"; _TYPE_=1; output;
run ;

proc sort data=flowfred.fmflow_loansfunded_20yr_rpt out=tmp nodupkey;
	by id;
run;

data tmp;
	set tmp (keep=id _TYPE_);
	do a="WTD","MTD","YTD","LTD";
		output;
	end;
run;

proc sql;
	create table FLOWFRED.FMFLOW_LOANSFUNDED_20YR_RPT_APP as
	select A.*,
		b._TYPE_
	from tmp A
	left join flowfred.fmflow_loansfunded_20yr_rpt b on a.id=b.id
		and a.a=b.a;
quit;

Here are the results I am getting from this. Note that currently the original data set currently has an observation that includes WTD in the variable TIMEFRAME. How should I interpret the results I am getting? Is this what would be expected?

id _TYPE_ a

1       1            WTD
1       1      MTD
1       1            YTD
1       1      LTD
2       1           WTD
2       1           MTD
2       1          YTD
2       1          LTD

RW9 · Posted 01-13-2016 12:55 PM

Hi,

Let me try again. Post some example test data - in the form of a datastep, and what you want the output to look like. Each of the replies given are of the same form - create the elements you need in the final dataset, then merge this back to original data. The merge takes care of missing records. However we can't provide exact scenarios without seeing what you have and want.

Astounding · Posted 01-13-2016 10:48 AM

This sounds like a good example of when to use macro language. To prepare, get two working versions of your program. One should work when WTD does exist, and the other should work when WTD doesn't exist.

When those are ready, it is likely a straightforward macro language application. Determine whether WTD exists, and have the macro implement one version of the program or the other.

Once we're that far, we can talk about the best implementation that allows for sets of variables to exist or not. (You won't need dozens of versions of a working program. Macro language will take care of that.)

FreelanceReinh · Posted 01-13-2016 10:51 AM

It looks like you try to use a CASE structure to create a new table conditionally. However, to my knowledge, you can only use it to create a new variable. Also, in the beginning you talk about a WTD column, but later about a WTD record (i.e row) and in your code 'WTD' occurs only as a value of a character variable TIMEFRAME.

In your second post you show a dataset with only three observations. Is this true also for your real data or do you have more records there, maybe with a key variable (not shown so far; see RW9's variable ID) and blocks of 3 or 4 records with TIMEFRAME values 'MTD', 'YTD', 'LTD' and possibly 'WTD'? If this is the case, I think you could achieve what you want even without merging, PROC SQL, auxiliary datasets or macros, in a single data step.

elwayfan446 · Posted 01-13-2016 10:55 AM

Sorry for the confusion about this. Yes, WTD, MTD, YTD, and LTD would be values of the TIMEFRAME variable in the data set. The data set has several variables other than TIMEFRAME for each observation. My goal is to create a dummy observation for WTD if it doesn't exist in the original data set, otherwise, just pull the observations for WTD, MTD, YTD, and LTD that will be in the original data set.

I hope that is a better explanation. Again, sorry for the confusion.

FreelanceReinh · Posted 01-13-2016 11:18 AM

So, assuming your existing dataset (call it HAVE) has only 3 or 4 observations (depending on whether an observation with TIMEFRAME='WTD' exists), you could create the new dataset (call it WANT) as shown below:

/* Scenario 1: WTD record exists */

data have;
input timeframe $ field2 field3;
cards;
MTD 1 11
YTD 2 22
LTD 3 33
WTD 4 44
;

data want;
set have end=last;
wtdex+(timeframe='WTD');
output;
if last & not wtdex then do;
  call missing(of _all_);
  timeframe='WTD';
  output;
end;
drop wtdex;
run;

/* Scenario 2: WTD record does not exist */

data have;
set have(obs=3);
run;

/* data want; ... same solution as above */

Reeza · Posted 01-13-2016 11:29 AM

Does that solution extend to past WTD? What if LTD is missing instead?

Registration is open

SAS Training: Just a Click Away