About RogerSpeas

RogerSpeas · ‎02-02-2017

The DATA step array code would be the most efficient...as it requires only one pass of the table. The PROC TRANPOSE code require that you dedup the id-curr_cat combination before tranposing (multiple categories create multiple rows for the id). PROC PRINT is probably the easiest way to create a CSV file... ods csv file='c:\item_sample.txt' ; proc print data=item(obs=100) noobs; var cust_edp_id curr_cat; *where ranuni(6)>.70 ; *if necessary, you can add this to get a 30% sample of the first 100, adjust OBS= and percentage accordingly; run; ods csv close;

RogerSpeas · ‎02-02-2017

As you previously posted a PROC TRANSPOSE question, I thought I would give you two examples, using PROC TRANSPOSE and then using ARRAYs. How many rows do you want for each ID? The code that you show appears that you are intend to collapse the multiple ID entries into on row, tranposing the Prod_CAT value into column names. Could you supply some data and how you want the result to appear?

RogerSpeas · ‎02-02-2017

You didn't post a CSV file for input, so I was tasked to writing and posting the code to generate the ITEM table. So when you inserted either of my code approched, you didn't get the results you wanted, hmmm? Was there an error? Or were the results not what you envisioned? For those who may have not run the code, I've include the result below. From the supplied code that created the seeded-randomly generated table ITEM, the final table shown below is the summarization of the ITEM table, where the cust_edp_id's prod_cat are concatenated into the column ALL_CATS. You can compare the ALL_CATS list with the dummy variables on each row. Could you please help me understand how my result differ from the result you desire? cust_edp_id all_cats pcat_AA pcat_AB pcat_AC pcat_AD pcat_AE pcat_AF pcat_AG pcat_AH pcat_AI pcat_AJ pcat_AK pcat_AL pcat_AM pcat_AN pcat_AO pcat_AP pcat_AQ pcat_AR pcat_AS pcat_AT pcat_AU pcat_AV pcat_AW pcat_AX pcat_AY pcat_AZ pcat_B pcat_C pcat_D pcat_E pcat_F pcat_G pcat_H pcat_I pcat_J pcat_K pcat_L pcat_M pcat_N pcat_O pcat_P pcat_Q pcat_R pcat_S pcat_T pcat_U pcat_V pcat_W pcat_X pcat_Y pcat_Z 1 AV-V-AG-AH-AX-AD-Q-J 0 0 0 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 2 AM-C-N-AG-P-J-AV-G 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 1 0 0 1 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 3 AP-AK-AV-AP-AZ-F-AL-AQ-AD-AD-S-C-O-AL-AG-AX-L-N-AE-AL-F 0 0 0 1 1 0 1 0 0 0 1 1 0 0 0 1 1 0 0 0 0 1 0 1 0 1 0 1 0 0 1 0 0 0 0 0 1 0 1 1 0 0 0 1 0 0 0 0 0 0 0 4 X-AN-Y-AB-K-H-AR-AK-AW-AG-E-F-AD-AG-AH-AJ-AI-AB-O-AS 0 1 0 1 0 0 1 1 1 1 1 0 0 1 0 0 0 1 1 0 0 0 1 0 0 0 0 0 0 1 1 0 1 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 1 1 0 5 F-I-Z-AG-J-AC-L-R 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 1 6 T-AL-D-AZ-J-U-K-B-E-AP-AO-AC-AS-AU-S-AZ-T-J-AY 0 0 1 0 0 0 0 0 0 0 0 1 0 0 1 1 0 0 1 0 1 0 0 0 1 1 1 0 1 1 0 0 0 0 1 1 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 7 G-U-V-Z-AJ-I-W 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 1 8 J-Q-I-G-G-V-T-AN-Y-C 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 1 1 0 0 0 0 0 0 1 0 0 1 0 1 0 0 1 0 9 T-AW-AU-S-X-AC-AK-J-Y-S 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 1 0 0 0 1 1 0 10 AG-AG-L-AG-AV-E-E-K-X-I-AH-R-AR-AS 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1 1 0 0 1 0 0 0 0 0 0 0 1 0 0 0 1 0 1 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 11 AH-AW-Y-AG-AV-AC-K-AS-AO-C-AA-AQ-F-B-Y-C-AA-Y-Z-AT-C-L-AM 1 0 1 0 0 0 1 1 0 0 0 0 1 0 1 0 1 0 1 1 0 1 1 0 0 0 1 1 0 0 1 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 12 C-AT-M-H-AZ-T-AR-Y-Y-B-T-W-AJ-T-AX-AC-Q-Y 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 1 0 0 0 1 0 1 1 1 0 0 0 0 1 0 0 0 0 1 0 0 0 1 0 0 1 0 0 1 0 1 0 13 V-J-AK-Z-V-AW-B-W 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 14 Y-AV-AM-AC-L-AU-AB-AR-B-L-R-AX-N-X-Z-AV-X-K-AA-AN 1 1 1 0 0 0 0 0 0 0 0 0 1 1 0 0 0 1 0 0 1 1 0 1 0 0 1 0 0 0 0 0 0 0 0 1 1 0 1 0 0 0 1 0 0 0 0 0 1 1 1 15 Z-AW-AF-AX-AT-AO-V-V-B-AZ-Q-T-AA-M-AP-AY-AR-AO-O-AO-B-Q-AQ 1 0 0 0 0 1 0 0 0 0 0 0 0 0 1 1 1 1 0 1 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 0 1 0 1 0 0 1 0 1 0 0 0 1 16 AE-AS-Y-B-J-AK-AK-AQ-AT-Y 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 17 N-H-AR-S-O-AJ-K-S-AY-K-AW-Q-AJ-V-F-AE-AZ-AZ-V-AV-AU-AL 0 0 0 0 1 0 0 0 0 1 0 1 0 0 0 0 0 1 0 0 1 1 1 0 1 1 0 0 0 0 1 0 1 0 0 1 0 0 1 1 0 1 0 1 0 0 1 0 0 0 0 18 AB-AF-Z-E-AC-AP-W-I-O-J-V-AG-Q-S-L-R-I-U 0 1 1 0 0 1 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 1 0 1 0 0 1 0 1 1 1 0 1 1 1 0 0 1 19 AX-AB-R-X-AJ-N-AN-F-AJ 0 1 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 1 0 0 20 H-R-AO-AU-AR-AT-Y-Y-W-X-AD-AF-Y-S-AR-Z 0 0 0 1 0 1 0 0 0 0 0 0 0 0 1 0 0 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 1 0 0 0 1 1 1

RogerSpeas · ‎02-01-2017

Are you wishing to know the number of times each of the categories were purchased. Instead of just flagging the variable with 1, just sum up the occurences. The WHEN expression would look a little like this .. when .... pcat_aa+1; when... pcat_ab+1; ... when pcat_z+1; With regards to the suggested TRANPOSE code, I was assuming just a flag was needed, FOUND. To deal with summarization, you need to pre-process the table with PROC MEANS or SQL... class id prod_cat ; output nobs=FOUND; or select id, prod_cat, count(*) as FOUND from temp group by 1,2; and then transpose. The advantage of using PROC MEANS, you can use a CLASSDATA table to fully rank all class levels, AA-AZ, B-Z for each ID. In my FORMAT example farther below, you could modify the "LIST" table to be used as a CLASSDATA table. Using a CLASSDATA table you would avoid having to reprocess to change missing values to zeros.

RogerSpeas · ‎02-01-2017

And you could use a format to do a fast position lookup in the array. You can create kind of a vlookup table with the data step and load the letter to position table into a format. AA=1 AB=2... Z=51. You could build a temporary array full of zero to intialize the FIRST row, or you can do it the lazy way and build a single row table with zero values. *Generate variable list pcat_AA--pcat_AZ pcat_B-pcat_Z and a DATA set with with positional lookup values AA=1 AB=2..... Z=51; data list ; length name $ 500 start $ 2; retain fmtname '$findit'; do i =65 to 90 ; letter= input( put( i, hex2.), $hex2.); name=catx(' ', name, cats('pcat_A', letter)); n+1; start=cats('A',letter); label=put(n,2.); output; end; do i =66 to 90 ; letter= input(put( i, hex2.), $hex2.); name=catx(' ',name, cats('pcat_', letter)); n+1; start=cats(letter); label=put(n,2.); output; end; put name=; call symputx('name_order',name); drop i name letter; run; proc format cntlin=list fmtlib; select $findit; run; *Generate a row of zeros; data row0; array cvar{51} &name_order (51*0); output; run; data temp2; set item; by cust_edp_id; if substr(curr_cat,1,1) = 'A' then prod_cat =substr(curr_cat,1,2); else prod_cat=substr(curr_cat,1,1); length all_cats $ 300; *may contain duplicates; all_cats=catx('-',all_Cats,prod_Cat); retain all_cats; if 0 then set row0; *sets the variable order; array pc{*} &name_order; if first.cust_edp_id=1 then do; _row=1; set row0 point=_row; *initialize to zero; end; _pos=input( put(prod_cat,$findit.), 2.) ; pc{_pos}=1; if last.cust_edp_id=1 then do; output; all_cats=' '; end; *drop curr_cat prod_cat; drop _pos; run;

RogerSpeas · ‎02-01-2017

I used the following to generate the ITEM data set... *Generated Random data; data item; do cust_edp_id=1 to 20; _occur=ceil(20*ranuni(6)) + 5; length curr_Cat $ 2; do _i=1 to _occur; _letter= input( put ( ceil(26*ranuni(6))+ 64 , hex2.) , $hex2.) ; *Generate a random letter... 65=A 66=B ; if ranuni(6)<.5 then curr_cat=cats('A', _letter); else do; _letter1= input( put ( ceil(25*ranuni(6))+ 65 , hex2.) , $hex2.) ; *Generate a random letter... B-Z; curr_cat=cats(_letter1,_letter); end; output; end; end; drop _:; run;

RogerSpeas · ‎02-01-2017

You could use PROC TRANSPOSE, but it's not as efficient as it does some re-read of the data. But it might be simple to understand... data temp; length prod_Cat $ 10; set item ; if substr(curr_cat,1,1) = 'A' then prod_cat =substr(curr_cat,1,2); else prod_cat=substr(curr_cat,1,1); prod_cat=cats('pcat_',prod_cat); found=1; run; proc sort data=temp(keep=cust_edp_id prod_cat found) out=sortTemp nodupkey; by cust_edp_id Prod_cat ; run; proc transpose data=sortTemp out=wideTemp ; by cust_edp_id; id prod_Cat; var found; run; *Generate a row of zeros; data row0; array cvar{51} &name_order (51*0); output; run; data wideTemp_filled; if 0 then set row0; *sets the variable order; set widetemp; array pc{*} $ pcat_: ; do i=1 to dim(pc); if pc(i)=. then pc{i}=0; end; drop i; run;

RogerSpeas · ‎02-03-2016

You note the UPLOAD uses the SAS OLEDB local data provider. Q1: Is the OLEDB provider dependent on the bitness of the SAS table? Supposing the providers are different, 32/64 and are dependent on the bitness of the SAS table...if one installs EG32 or EG64 (no local SAS) are both the SAS OLE32 and OLE64 data providers installed? Q2: If a SAS table is loaded from one OS, Windows, to another OS, Unix...will the table be migrated so the cross-platform engine on the Unix server would need be utitlized (32 migrates to 64). (I might surmise that if the OLEDB driver is being used, the SAS table is not going through a simple binary/ftp transfer) Thanks

RogerSpeas · ‎11-24-2015

Oh, that.... LT 0 includes missing and you only code boolean logic, not ternary logic... Which is why I used.. sum(b.profit<0 and b.profit is not missing) Additionally, you don't need a GROUP BY, as you are running a correlated query. But, as you now now need three value logic based on a different set of logic...I believe you'll need to run two correlated queries to SAS_STATS original code, as you have three value logic... proc sql; create table want as select *, case when (select count(*) from have where coid=a.coid and year le 2000 and (profit lt 0 and profit is not missing)) gt 1 then 1 when (select count(*) from have where coid=a.coid and year le 2000 and (profit is not missing )) gt 1 then 0 else . end as MorethanTwo from have as a order by coid, year ; quit; Sorry, as this might come off as lecturing, but this is SAScommunities, and not meant to be SQLcommunities. I was just hoping to promote the efficiency of SAS code even if it is SQL... however, new lesson, old story, MASK variables. The summary query could be written as to capture Found2 (negative) and Any2 (reported). However, two columns returned would not make it suitable for the correlated query in the CASE expression, which can only return one column. So I was thinking, we could improve SAS_STAT code by MASKing the results from the two columns... proc sql; select coid, sum(profit lt 0 and profit is not mising) gt 1 as Found2, sum(profit is not missing ) gt 1 as Any2, from have as mtr where mtr.year le 2000 group by coid ; *Combine the columns into one mask variable... ; select coid, ((sum(profit lt 0 and profit is not mising) gt 1) * 10 ) + (sum(mtr.profit is not missing) gt 1) as Mask from have as MTR where mtr.year le 2000 group by mtr.coid ; quit; So, let's go back to your original question and SAS_STAT's code and insert the MASK query, so instead possibly two subqueries for each row, you could have just one ONE correlated subquery query for every row... *So the correlated query might look like this... ; proc sql; create table want as select *, case ( select ((sum(profit lt 0 and profit is not missing) gt 1) * 10 ) + (sum(profit is not missing) gt 1) as Mask from have where coid=a.coid and year le 2000 ) when (11) then 1 when (01) then 0 when (00) then . else . end as MoreThanTwo from have as a order by coid, year ; quit; Typically, I would say that you should avoid the correlated query when you can create a summary table. The ternary logic... Found2 1, AnyTwo , 0, less than 1 found (00), missing. Thus adding a third left join might look a little like this... proc sql; select have.*, y2.TwoInRow, y3.ThreeInRow, case (y4.mask) when (11) then 1 when (01) then 0 when (00) then . else . end as MoreThanTwo from (have .... ) as y3 left join ( ... MASK query here or Found2/Any2 query... ) as y4 on y1.coid = y4.coid ; The alternative SAS code with likely one pass of the data... data need; set have(keep=coid year profit) ; *first pass check for runs; by coid; if first.coid then do; count=0; count_nm=0; call missing( TwoInRow, ThreeInRow, Found2, Any2); retain TwoInRow ThreeInRow Found2 Any2; end; if year<=2000 then do; *exclude if check for all years; if ( first.coid ne 1 and year-1 ne lag(year)) or profit=. then do; count=0; count_nm=0; end; *reset counters, non-neg and any sequential year counters; else do; *positive and negative profits, missing excluded; count_nm+1; *count all non-missing; Any2+1; if profit<0 then do; count+1; *count continuous neg; Found2+1; *any neg count; end; else do; *non-negative profit; count=0; *reset continuous neg; end; end; *Ternary logic -- zero, one, default missing ; if count_nm=2 and TwoInRow=. then TwoInRow=0; *Trigger zero value; else if count_nm=3 and ThreeInRow=. then ThreeInRow=0; if count=2 then TwoInRow=1; * *Trigger one value; else if count=3 then ThreeInRow=1; drop count: ; end; if last.coid; if Found2>1 then MoreThan2=1; else if Any2>1 then MoreThan2=0; *drop any2 found2; do until(last.coid); *if last, re-read the BY group; set have; by coid; *reset first./last. ; output; end; run;

RogerSpeas · ‎11-24-2015

I wasn't sure of your ternary logic (it seems a little contradictory)... *MoreThanTwo Three Years Two Years . 0 1 . . d.n.e d.n.e 0 1 0 d.n.e 1 1 1 1 It appears that if TwoYears=0, you want MoreThanTwo to be '1'? Is that correct("456")? select have.*, y2.TwoInRow, y3.ThreeInRow case when y2.TwoInRow=1 then 1 when y2.TwoInRow=0 annd y3.ThreeInRow=0 then 0 otherwise . end as MoreThanTwo from ... ; In the table you show situations where both TwoInRow and ThreeInRow are both missing ("890" and "892") but they have different ternary results both missing as well as 0? The program logic for MoreThanTwo applies to the results table that is shown above. Do you have more logic to consider? In other words, what makes MoreThanTwo different than TwoInRow?

RogerSpeas · ‎11-23-2015

Looks like there was an errant comma... create table twoYear as select a.coid, a.year, sum(b.profit<=0 and b.profit is not missing ) as TwoBadYears, sum(b.profit=.) as C2 from have as a, have as b where a.coid=b.coid and (a.year between b.year-0 and b.year-1 and a.year<=1999) group by a.coid, a.year; ... before the FROM clause. I suppose that you realize that was intermediate explanatory code to explain the 40 or so lines of SQLgetti that followed (which did not have the errant comma). Good luck

RogerSpeas · ‎11-20-2015

Yup... I missed the transpose. I saw col1 and but was thinking var1. However, the code is recursive, as the subquery is a correlated subquery.... proc sql; create table wantList as select subj, scoreId, case when col1 is missing then . when col1 < 50 then (select min(col1) from list where scoreId=a.scoreId and col1>= 50 ) when col1 > 500 then (select max(col1) from list where scoreId=a.scoreId and col1<= 500) else col1 end as score from list as a order by subj, scoreId; quit; Let say, the table is 1000 rows long and 85 columns wide... the transposed table would be 84000 rows/values. For each of those 84000 rows... if an outlier occurs, the summary subqueries would need to scan the LIST. So if the out orange values were rare in WANTLIST table, lets say 20 values out of range, the LIST table, would be scanned/subqueried, 20 times. Additionally, if the table is small the read could be from cache. data have; array var(84); do subj=1 to 1000; do i=1 to 84; drop i; var{i}= (ranuni(10)*480)+ 30 ; end; output; end; run; proc transpose data=have out=list(index=(scoreId)) name=scoreId; by subj; var var:; run; options msglevel=i; proc sql _method; create table wantList as ... And yes, the impact on I/O from the recursive correlated query could be reduced if an INDEX existed. However, the info notes the use of the INDEX would be cancelled... again likely as the table is somewhat small. If one pre-summarize the low/high value into a table, and JOIN. As the Low/high table 3x84 would be quite small, I would agree a JOIN would like use the HASH (as it did below). proc sql _method; create table lowHigh as select scoreid, min(col1) as min , max(col1) as max from list where col1 between 50 and 500 group by scoreid; create table wantList as select subj, l.scoreId, case when col1 is missing then . when col1 < 50 then min when col1 > 500 then max else col1 end as score from list as l = lowHigh as s where l.scoreid = s.scoreid order by subj, l.scoreId; quit; So, I would agree the SAS transposing of the table simiplies the problem. My previous post was to illustrate that SQL-centric code could be both code intense and/or I/O intense. Either using ARRAYS to horizontally deal with number of columns or using TRANSPOSE to vertically array those columns, both would signiicantly reduce the pure SQL alternative. Here, transpose, summary query, bottom/top code query, transpose... 4 table scans. Whichever method is used, I believe it is useful to create a summary table, lowHigh, to join back with the detail data. In my prior post, I was just trying to point out that the recoded view would simplify Reeza's PROC MEANS example, and could as well simplify the some of pure SQL summary code.

RogerSpeas · ‎11-20-2015

Did I understand that you were try to set the bottom/top (low and high) 84 columns independently? If so, the SQL magic might require that you write 164 SQL sub-query, and thus 184 table scans. Astounding's code, using the data step reads the table just twice. But... we could super-charge the SQL code. I might suggest adding some SAS magic dust, which could reduce the164 table scans into one scan for summary totals. data classv / view=classv; set sashelp.class; array n(*) age--weight; do i=1 to dim(n); drop i ; if n{i}<=50 then n{i}=.; else if n{i}>=100 then n{i}=.; end; run; The SQL magic would look like this... proc sql; create view classv as select case when (var1 not between 50 and 500) then . else var1 end as var1 ... case when (var84 not between 50 and 500) then . else var84 end as var84 from class; quit; (we won't split hair yet, but, recoding out of range value to missing... 7 lines of data step view code vs 88 lines of SQL view code) What's my point...the magic here is to first recode the value outside of the 50-500 range to missing, without reading/writing the table, by using a view, SQL or DATA, your preference. With values recoded to missing, and unlike the subqueries which required independent WHERE expression, the SQL MIN/MAX summaries of this view would be the first values found. There is no need for independent WHERE expressions, nothing is outside the range for any of the columns except missing. So, by re-coding extreme values to missing, we can summarize in one pass and put the results in a table (or inline view)... proc sql; create table lowhigh as select min(age) as age_min, max(age) as max(age), ... min(var84) as var84_max, max(var84) as var84_max from classv; quit; The point again is to avoid 184 subqueries/table scans. However, the summary query would still need to have 168 summary expressions written for the 84 columns... so maybe another 88 lines of SQL code. Now, Neeza's suggestion of PROC MEANS, and later recanted, when it was realized that one might be looking at writing 84 PROC MEANS step each with 84 independent WHERE expression with 84 table scans, just like the 84 subqueries. However, the magic of the recoded to missing view is one can easily consumed the view with a single pass of PROC MEANS much like SQL. data classv / view=classv; set sashelp.class; array n(*) age--weight; do i=1 to dim(n); drop i ; if n{i}<=50 then n{i}=.; else if n{i}>=100 then n{i}=.; end; run; proc means data=classv noprint ; var age--weight; output out=lowhigh min()= max()= / autoname ; run; If you can live with some more SAS magic, the autoname option avoid writing those pesky 168 SQL summary expressions. But, wait, we still have to do the bottom/top coding. We would next do a blind many-to-one join of the lowhigh bottom/top coding values back into the detail table... proc sql; create view classv as select case when (var1<=50) then var1_min when (var1>=500) then var1_max var1 else var1 end as var1, .... case when (var84<=50) then var84_min when (var84>=500) then var1_max var84 else var84 end as var84 from class, lowhigh; quit; We can call that, what, 84 case expressions or 250+ lines of code. So in SQL, the view to recode to missing view, 84 lines, find the lowhigh summary table, 168 summary expressions, and finally bottom/top-coding the 84 columns, 84 case expressions... but just two table scans. (Albeit, you could write a macro program.) The data step code for bottom/top coding... data newclass; set sashelp.class; if _n_=1 then set lowhigh; array actual{3} age--weight; array stat{2,3} age_min--weight_max; if stat{1,_i}=. then actual{_i}=75; else if actual{_i}<=50 then actual{_i}=stat{1,_i}; else if actual{_i}>=100 then actual{_i}=stat{2,_i}; drop _: age_min--weight_max; run; Data step view, 8 lines of code; the PROC means summarization, 4 lines of code, the bottom/top coding, 10 lines of code (all of which could be tightened) Astounding single data step code, 20 lines. Where's the beef/magic?

RogerSpeas · ‎11-19-2015

Given this data... data have; infile datalines dlm='09'x truncover ; input coid year profit ta; datalines; 123 1995 -100 1 123 1996 -100 2 123 1997 -100 3 123 1998 -100 4 123 1999 -100 5 123 2000 -150 6 123 2001 200 7 223 1995 -100 1 223 1996 -100 2 223 1998 10 3 223 1999 -100 4 223 2000 -150 5 223 2001 200 6 456 1996 -10 1 456 1997 -10 2 456 1998 . 3 456 1999 -10 4 456 2000 -10 5 456 2001 . 6 789 1997 -100 0 789 1998 . 1 789 1999 100 2 789 2000 -100 3 789 2001 . 4 789 2002 . 5 890 1998 . 1 890 1999 . 2 890 2000 -100 3 890 2001 -100 4 890 2002 100 5 890 2003 200 6 890 2004 300 7 891 1997 100 1 891 1998 -100 2 891 1999 -200 3 891 2000 200 4 891 2001 200 5 891 2002 100 6 892 1998 -100 1 892 1999 . 2 892 2000 100 3 892 2001 100 4 ; run; You would create rolling totals number of negative and number of non-missing for each year prior to 2000, the threeYear and twoYEar tables below. As they would represent multiple years prior to 2000, you would then need to summary the annual rolling count to determin the ternary logic. So if there are three negative years, then "1", else if there are three non-missing value, then "0", otherwise missing. You can review the rolling totals in the tables, and the pre-ternary summary results from the select clauses. proc sql; create table threeYear as select a.coid, a.year, sum(b.profit<0 and b.profit is not missing) as ThreeBadYears, sum(b.profit is not missing) as c3 from have as a, have as b where a.coid=b.coid and (a.year between b.year-0 and b.year-2 and a.year<=1998) group by a.coid, a.year; select coid, max(threeBadYears) as Bad3, min(c3) as NM3 from threeyear group by coid; *if Bad3=3 and NM3=3 then 1 if Bad2<3 and NM3=3 then 0 if NM3<3 then . ; create table twoYear as select a.coid, a.year, sum(b.profit<=0 and b.profit is not missing ) as TwoBadYears, sum(b.profit=.) as C2, from have as a, have as b where a.coid=b.coid and (a.year between b.year-0 and b.year-1 and a.year<=1999) group by a.coid, a.year; select coid, max(TwoBadYears) as Bad2, max(C2) as NM2 from TwoYear group by coid; *if Bad2=2 then 1 if NM=2 then 0 if NM<2 then . ; quit; You can combine those queries as inline views. Instead of include the maximum value, you would use the maximum values in the CASE expression to determine the ternary value. As not all company might appears in the Two and Three years rolling total tables, you would need to use a LEFT JOIN so as not to remove company not represented in these two table. The ternary values would be returned for companies with pre-2000 data and the LEFT JOIN would set companies without data to missing. proc sql; select have.*, y2.TwoInRow, y3.ThreeInRow from (have left join (select coid, case when (max(TwoBadYears)=2) then 1 when (max(C2)=2) then 0 else . end as TwoInRow from ( select a.coid, a.year, sum(b.profit<0 and b.profit is not missing) as TwoBadYears, sum(b.profit is not missing) as C2 from have as a, have as b where a.coid=b.coid and (a.year between b.year-1 and b.year+0 and a.year<=1999) group by a.coid, a.year ) group by coid) as Y2 on have.coid = y2.coid ) left join (select coid, case when (max(ThreeBadYears)=3) then 1 when (max(C3)=3) then 0 end as ThreeInRow from ( select c.coid, c.year, sum(d.profit<0 and d.profit is not missing) as ThreeBadYears, sum(d.profit is not missing) as C3 from have as c, have as d where c.coid=d.coid and (c.year between d.year-2 and d.year+0 and c.year<=1998) group by c.coid, c.year ) group by coid) as Y3 　 on have.coid=y3.coid ; quit; The SQL code is highly recursive as SQL does not handle sequential processing as efficiently as the data step. The HAVE table is ready 5 times, and 4 intermediate summary tables are re-read as inline views (albeit they are narrow tables and if HAVE is small the OS might have cached the file). The DATA step is likely to read the table just once with a smaller memory/cpu footprint.

RogerSpeas · ‎11-13-2015

My bad, I thought the objective was different formats on each row... Obs Obs small medium large elem middle high all 1 1 I II III I 0002 0003 0004 4 1 one two three one two three four data row1; infile datalines dsd truncover; input small medium large elem middle high all ; format small--elem roman6. middle--all z4. ; datalines; 1,2,3,1,2,3,4 ; run; data row2; infile datalines dsd truncover; input small medium large elem middle high all ; format _all_ words. ; datalines; 1,2,3,1,2,3,4 ; run; ods csv file='c:\temp\schools.csv'; proc print data=row1; run; proc print data=row2; run; ods csv close; proc import datafile ='c:\temp\schools.csv' dbms=csv out=stacked replace; getnames=yes; run; proc print data=stacked; where obs = 1 ; run;

Online Status	Offline
Date Last Visited	‎09-09-2017 05:10 AM

Re: help from an array expert

Re: help from an array expert

Re: help from an array expert

Re: help from an array expert

Re: help from an array expert

Re: help from an array expert

Re: help from an array expert

Re: Upload/Download SAS Data Sets task for Enterprise Guide 4.1

Re: Help with Proc SQL step

Re: Help with Proc SQL step

Re: CALL SYMPUT vs CALL SYMPUTX

Re: help from an array expert

Re: Upload/Download SAS Data Sets task for Enterprise Guide 4.1

Re: How to reduce outlier values to next highest value within range

Re: Help with Proc SQL step

Re: Help with Proc SQL step

Re: help from an array expert

Re: help from an array expert

Re: help from an array expert

Re: help from an array expert

Re: help from an array expert

Re: help from an array expert

Re: help from an array expert

Re: Upload/Download SAS Data Sets task for Enterprise Guide 4.1

Re: Help with Proc SQL step

Re: Help with Proc SQL step

Re: Help with Proc SQL step

Re: How to reduce outlier values to next highest value within range

Re: How to reduce outlier values to next highest value within range

Re: Help with Proc SQL step

Re: Formatting question