About chang_y_chung_hotmail_com

chang_y_chung_hotmail_com · ‎05-02-2012

@Yura2301 I would love to see your implementation of Tarjan's. In any case, here is a much easier way -- just calling the strongComp() function in RBGL package of R. I am running sas 9.3 and R 2.13.0 on windows 7. data one; input (from to) (:$1.) @@; weight = 1; cards; 1 2 2 6 2 8 6 2 6 8 6 9 3 3 4 4 5 7 8 2 8 6 9 6 ; run; proc iml; run ExportDataSetToR("work.one", "edgelist"); submit / r; library("graph") library("RBGL") g <- graphBAM(edgelist, edgemode="directed") # find the strong components and return a data.frame sc <- strongComp(g) cs <- c() ns <- c() for (i in 1:length(sc)) { len <- length(sc[]) cs <- c(cs, rep(i, times=len)) ns <- c(ns, sc[]) } strongComp <- data.frame(node=ns, comp=cs) endsubmit; call ImportDataSetFromR("work.two", "strongComp"); quit; /* check */ proc print data=two noobs; run; /* on lst node comp 2 1 6 1 8 1 9 1 1 2 3 3 4 4 7 5 5 6 */

chang_y_chung_hotmail_com · ‎04-26-2012

I would reshape the data to long and use a simple data step to extract relevant observations before tabulating. The useful coding pattern in the data step is the "Double-DoW" (search sas-l like this for some examples). Below I am assuming that you have imported your excel sheet into a dataset, work.one: /* reshape to long */ data two; length id week numabb 8 change $40.; set one; change1 = repeat(" ", 40-1); array num[1:16] numabb1-numabb16; array chg[1:16] $ change1-change16; do week = 1 to 16; numabb = num[week]; change = chg[week]; output; keep id week numabb change; end; run; /* what happened to those with four week trials opened in weeks 1 to 4 after 8 (or upto the available) weeks later? if there are two or more trial_4w^s opened in the period, then we take the latest one. See id=7098 */ data three; /* double DoW */ opened = 0; do until (last.id); set two; by id; if (1<=week<=4 and change="OPENED to TRIAL_4W") then opened = week; end; do until (last.id); set two; by id; if opened>0 and week = opened + min(opened+8, 16) then output; end; run; proc freq data=three; tables change/list missing; run; /* on lst The FREQ Procedure Cumulative Cumulative change Frequency Percent Frequency Percent ------------------------------------------------------------------------ 228 86.36 228 86.36 CAMPAIGN_10W to CAMPAIGN_10W 1 0.38 229 86.74 CAMPAIGN_13W to CAMPAIGN_13W 1 0.38 230 87.12 CLOSED from FREE_05 1 0.38 231 87.50 CLOSED from TRIAL_13W 1 0.38 232 87.88 FREE_05 to FREE_05 1 0.38 233 88.26 MIDL to MIDL 1 0.38 234 88.64 NORMAL_13W to NORMAL_13W 20 7.58 254 96.21 NORMAL_52W to NORMAL_52W 2 0.76 256 96.97 NORMAL_KVARTAL to NORMAL_KVARTAL 3 1.14 259 98.11 OPENED to CAMPAIGN_13W 1 0.38 260 98.48 OPENED to NORMAL_13W 1 0.38 261 98.86 TRIAL_13W to TRIAL_13W 2 0.76 263 99.62 TRIAL_26W to TRIAL_26W 1 0.38 264 100.00 */

chang_y_chung_hotmail_com · ‎04-19-2012

Here is an alternative approach -- reading one character at a time. %let pwd = z:\; data names; infile "&pwd\names.txt" recfm=n unbuffered eof=output; length name $20; do while (1); input c $1. @@; if c = "," then link output; else name = catt(name,c); end; stop; output: name = dequote(name); output; name = ""; keep name; return; run; /* check. first and last three names */ ods _all_ close; ods listing; title "first three names"; proc print data=names(obs=3); run; title "last three names"; proc print data=names(firstobs=5161 obs=5163); run; title; ods listing close; /* on Results first three names Obs name 1 MARY 2 PATRICIA 3 LINDA last three names Obs name 5161 DARELL 5162 BRODERICK 5163 ALONSO */

chang_y_chung_hotmail_com · ‎04-05-2012

I would prefer running a proc datasets to running a whole data step just to rename variables. In any case, the key here is to generate the rename list automatically. Here is another way. Just make sure that you give long enough length in the initial value of the retained variable in the data _null_ step. hope this helps a bit. data names have; input (a b c d) (:$8.); if _n_ = 1 then output names; else output have; cards; ID Name Sex Country 1 ABC M IND 2 BCD F USA 3 CDE M GER 4 DGE M UK ; run; proc transpose data=names out=names(rename=(_name_=old col1=new)); var _all_; run; data _null_; retain rename "%sysfunc(repeat(%str( ), 80))"; set names end=end; rename = catx(" ", rename, catx("=", old, new)); if end then call symputx("rename", rename); run; %put rename=***&rename***;

chang_y_chung_hotmail_com · ‎04-05-2012

Here is an example. hth %put %eval( 1.0 eq 1); %*-- returns 0 for false -- *;

chang_y_chung_hotmail_com · ‎02-15-2012

Here is one way. hth /* test data */ proc plan seed=12345678; factors obs=50 ordered rep=100 ordered x=1 of 1000 random; output out=sim; run; /* mean of x over 100 reps for each obs */ proc means data=sim; var x; by obs; run; /* on lst obs=1 The MEANS Procedure Analysis Variable : x N Mean Std Dev Minimum Maximum -------------------------------------------------------------------- 100 503.3600000 311.3526864 14.0000000 1000.00 -------------------------------------------------------------------- obs=2 Analysis Variable : x N Mean Std Dev Minimum Maximum -------------------------------------------------------------------- 100 483.4900000 290.8708796 8.0000000 979.0000000 -------------------------------------------------------------------- ... obs=50 Analysis Variable : x N Mean Std Dev Minimum Maximum -------------------------------------------------------------------- 100 467.2800000 273.5758747 2.0000000 999.0000000 -------------------------------------------------------------------- */

chang_y_chung_hotmail_com · ‎01-03-2012

Your path is not to be resolved by an explicit or implicit %eval(). Thus you don't need to macro quote the singleton ampersand character at all. The ampersand has to be hidden from the cmd.exe shell though, since it is a command separator to the interpreter (see msdn article). The following worked fine on my sas running on a windows 7: %let drive = c:; %let path = /temp/a & b; x "&drive & cd ""&path"" & dir *.csv > list.txt";

chang_y_chung_hotmail_com · ‎12-16-2011

OP has the right idea to use a hash object to efficiently execute this kind of join. For each observation on one side (let's say dataset S), you loop over possibly multiple observations from the other side (dataset Q). The efficiency comes from looping over only those multiple observations that have matching id, instead of looping over all the observations on the dataset Q. At least I think the OP's intention was this. Unfortunately, OP's code shows that it is looping over all the Q observations given each S observation and is not efficient at all. I recommend careful reading of Ray and Secosky (2008). /* test data */ data s; input id lang $ v1 v2; cards; 1 a 1 2 2 b 3 7 3 x 8 9 ; run; data q; input id q_lang $ q_v1 q_v2; cards; 1 a 1 2 2 b 3 4 2 b 5 6 3 c 9 9 ; run; /* var lists */ %let svars = v1 v2; %let qvars = %sysfunc(transtrn(%str( )&svars,%str( ),%str( )q_)); /* for each obs in s, loops over the id-matching q observations */ data nonmatches; /* load q data in hash */ if _n_ = 1 then do; if 0 then set q; dcl hash h(dataset:'q', multidata:'y'); h.defineKey('id'); h.defineData(all:'y'); h.defineDone(); end; retain OK 0; /* for each obs in s */ set s; array qarr &qvars; array sarr &svars; rc = h.find(); do while (rc = OK); if lang = q_lang then do i = 1 to dim(qarr); if qarr ^= sarr then do; var = vname(sarr); qval = qarr; sval = sarr; output; end; end; rc = h.find_next(); end; keep id lang var qval sval; run; /* check */ proc print data=nonmatches noobs; run; /* on lst id lang var qval sval 2 b v2 4 7 2 b v1 5 3 2 b v2 6 7 */

chang_y_chung_hotmail_com · ‎12-13-2011

I am partial to separating out the part that is repeated from the part that loops. This makes it simple to test the repeated part. I am also for making the macro arguments formal and declaring all local macro variables explicitly. One good side effect of using the formal arguments is that this makes it clear who is responsible for quoting what. For instance, if the macro caller does not quote the comma that is within the names parameter value, then even the macro invocation will fail; the macro author cannot do anything about it; thus the quoting responsibility naturally falls on to the macro user. Within the macro, however, the macro author should have used %qscan function, instead of %scan, when certain characters are expected, because %scan function returns an unquoted text. hth %macro report(name, ccy); %put ***&name***&ccy***; %mend report; %macro e5(names=, ccys=, dlm=#); %if %superq(names) = %then %return; %if %superq(ccys) = %then %return; %local i j name ccy; %let i = 1; %let name = %qscan(&names, &i, &dlm); %do %while (&name ^=); %let j = 1; %let ccy = %qscan(&ccys, &j, &dlm); %do %while (&ccy ^=); %report(&name, &ccy) %let j = %eval(&j + 1); %let ccy = %qscan(&ccys, &j, &dlm); %end; %let i = %eval(&i + 1); %let name = %qscan(&names, &i, &dlm); %end; %mend e5; %let names = a#b, b#c; %let ccys = 1#2#3; %e5(names=%superq(names), ccys=&ccys) %*-- on log ***a***1*** ***a***2*** ***a***3*** ***b, b***1*** ***b, b***2*** ***b, b***3*** ***c***1*** ***c***2*** ***c***3*** --*;

chang_y_chung_hotmail_com · ‎11-29-2011

If you *really* want to have the three-letter month name lower-cased, then here is one way. hth. /* create a custom date format, datedash, for the date range, 1jan2010 - 31dec2030 */ data f; retain fmtname "datedash"; drop d; do d = '1jan2010'd to '31dec2030'd; start = d; label = catx('-', put(day(d), z2.0), lowcase(put(d, monname3.)), put(d, year2.)); output; end; hlo = "other"; label = repeat("*",9-1); output; run; proc format cntlin=f; run; /* check */ data _null_; do d = '31dec2009'd, '1sep2010'd, '15dec2015'd, '31dec2030'd, '1jan2031'd; put d= :datedash.; end; run; /* on log d=********* d=01-sep-10 d=15-dec-15 d=31-dec-30 d=********* */

chang_y_chung_hotmail_com · ‎11-23-2011

All the 7-bit ascii characters are also valid utf-8 encoded unicode characters -- utf-8 encoding scheme was specifically designed to be this way. As long as your input characters are ascii (no extended characters), your output should be the same. hth.

chang_y_chung_hotmail_com · ‎10-26-2011

See if an old sas-l posting of mine helps.

chang_y_chung_hotmail_com · ‎09-28-2011

Since there are only 200 or so 5-digit (perfect) square numbers, even the exhaustive search does not take a long time to run -- about 5-6 seconds on my 32bit windows box. hth. /* 5-digit (perfect) square numbers only */ data one; do i = 1e2 to 1e3; j = i**2; if j < 1e4 then continue; if j > 1e5 - 1 then stop; v = put(j, 5.0); keep v; output; end; run; proc fcmp outlib=work.func.quiz; /* returns > 1000 if any condition is not met */ function usedOnceDigit(v1 $, v2 $, v3 $); length v $15 nDigits usedOnceDigit nUsedOnceDigits i j 8 c $1; array freq[10] (0 0 0 0 0 0 0 0 0 0); v = cat(v1, v2, v3); nDigits = 0; usedOnceDigit = .; nUsedOnceDigits = 0; do i = 1 to 10; c = substr("1234567890", i, 1); freq = 15 - length(compress(v, c)); if freq > 0 then do; if i = freq then return(1005); if freq >= 10 then return(1004); if index(v, substr("1234567890", freq, 1)) = 0 then return(1004); nDigits + 1; if nDigits > 5 then return(1002); do j = 1 to i-1; if freq > 0 and freq = freq then return( 1003); end; if freq = 1 then do; usedOnceDigit = i; nUsedOnceDigits + 1; end; end; end; if nDigits < 5 then return(1002); if nUsedOnceDigits ^= 1 then return(1006); return(usedOnceDigit); endsub; quit; %let cmplib = %sysfunc(getoption(cmplib)); options cmplib = (work.func &cmplib); proc sql; select "answer is: ", v1, v2, v3 from ( select o1.v as v1, o2.v as v2, o3.v as v3 , usedOnceDigit(o1.v, o2.v, o3.v) as usedOnce from one as o1, one as o2, one as o3 where o1.v < o2.v and o2.v < o3.v and calculated usedOnce < 1000 ) group by usedOnce having count(*) = 1; quit; /* on lst v1 v2 v3 -------------------------------- answer is: 12321 33124 34225 */ options cmplib=(&cmplib);

chang_y_chung_hotmail_com · ‎09-21-2011

The OP's problem is one of finding (connected) components given a graph, once we convert the given data into one. Below is my try that implements the depth-first search(DFS) based component finding algorithm running on the graph represented as an edgelist stored in a hash-of-hashes. This runs pretty fast. On my desktop with 32bit windows, it usually takes 10 to 20 seconds or so given N=10000. But it consumes an enormous amount of memory. The code asked for 910M at peak and used 512M while running for N=10000, most of which seems to be overhead for hash objects. Enjoy. options nocenter mprint fullstimer; /* test data */ %let N=10000; %let seed=20110921; data one; length id 4 v1-v4 $3; array arr v1-v4; alpha = "abcdefghijklmnopqrstuvwxyz"; alphaLen = vlength(alpha); vlen = vlength(v1); do id = 1 to &N; call missing(of v1-v4); do i = 1 to 4; do j = 1 to vlen; p = ceil(alphaLen*ranuni(&seed)); substr(arr,j,1) = substr(alpha,p,1); end; end; output; keep id v1-v4; end; run; /* a tiny test dataset -- uncomment to use this one data one; input id (v1 v2 v3 v4) ($); cards; 1 a b c d the graph has four (connected) components, 2 a e f g including two singletons, like so: 3 . . . . 4 i j k l 1 - 2 - 6 - 9 5 . . m . \ | 6 . n f o 8 7 . . m . 3 8 . n . g 5 - 7 9 . . . o 4 ; run; */ /* step one: convert it to a graph represented as an edge list. */ %macro matches; %local i j k; %let k = 1; %do i = 1 %to 4; %do j = &i %to 4; %let k = %eval(&k + 1); union all select m&k..id as tail, n&k..id as head from one as m&k., one as n&k. where m&k..v&i = n&k..v&j and not missing(m&k..v&i) %end; %end; %mend matches; proc sql _method; drop index id on one; create index id on one (id); drop index v1 on one; create index v1 on one (v1); drop index v2 on one; create index v2 on one (v2); drop index v3 on one; create index v3 on one (v3); drop index v4 on one; create index v4 on one (v4); create table g as select distinct mn.tail, mn.head from ( select m1.id as tail, n1.id as head from one as m1, one as n1 where m1.id=n1.id %matches ) as mn order by mn.tail, mn.head; quit; /* Step Two: to find connected components and singletons. depth-first search(DFS) of a hash-of-hash representation of the graph. the limit on the depth of recursive link blocks is 11 by default but can be changed by the stack= data statement option. */ data c(rename=(tail=vertex cid=component)) /stack=&N; retain OK top component 0; dcl hash tails(); dcl hash heads(); dcl hash stack(); dcl hiter ti; dcl hiter hi; link main; main: link loadData; do rc = ti.first() by 0 while (rc = OK); tails.find(); if missing(cid) then do; component + 1; cid = component; tails.replace(); link connect; end; rc = ti.next(); end; link doOutput; stop; return; connect: link pushStack; hi = _new_ hiter('heads'); do rc = hi.first() by 0 while (rc = OK); tail = head; tails.find(); if missing(cid) then do; cid = component; tails.replace(); link connect; end; rc = hi.next(); end; link popStack; return; loadData: /* load data into a hash of hashes */ tails = _new_ hash(ordered:'a'); tails.defineKey('tail'); tails.defineData('tail', 'cid', 'heads'); tails.defineDone(); do until (end); set g end=end; by tail; call missing(cid); if first.tail then do; heads = _new_ hash(ordered:'a'); heads.defineKey('head'); heads.defineData('head'); heads.defineDone(); tails.replace(); end; heads.replace(); end; put "NOTE: done loading data into memory hash-of-hashes."; ti = _new_ hiter('tails'); /* a call stack */ stack = _new_ hash(ordered:'a'); stack.defineKey('top'); stack.defineData('tail', 'cid', 'heads', 'head', 'rc', 'hi'); stack.defineDone(); return; popStack: if stack.find() ^= OK then return; stack.remove(); top + (-1); return; pushStack: top + 1; stack.add(); return; doOutput: do rc = ti.first() by 0 while (rc = OK); keep tail cid; output c; rc = ti.next(); end; return; run; /* check -- how many clusters? */ proc sql; select size as 'component size'n, count(*) as 'number of components'n from ( select component as componentID, count(*) as size from c group by component ) group by size order by size desc; quit; /* on lst: for the small dataset component number of size components --------------------- 5 1 2 1 1 2 for the larger one component number of size components --------------------- 9964 1 2 3 1 30 */

chang_y_chung_hotmail_com · ‎08-11-2011

Here is one way. HTH. /* test data */ data one; input gvkey execid year ceo cfo other; cards; 103 6 1992 0 0 1 103 6 1993 1 0 0 103 6 1994 1 0 0 259 6 1996 0 0 1 259 6 1997 0 0 1 259 6 1998 1 0 0 259 6 1999 1 0 0 259 6 2000 1 0 0 1990 6 2003 0 0 1 1990 6 2004 0 0 1 1990 6 2005 0 1 0 1990 6 2006 0 1 0 34 19 1994 0 0 1 34 19 1995 0 0 1 34 19 1996 0 0 1 34 19 1997 0 0 1 2503 19 1998 0 0 1 2503 19 1999 1 0 0 2503 19 2000 1 0 0 2503 19 2001 1 0 0 2503 19 2002 1 0 0 ; run; /* compress into exec-company */ %let vars = gvkey execid from to ceo cfo other; %let varsc = %sysfunc(translate(&vars,%str(,), %str( ))); data two; call missing(&varsc); do until (last.gvkey); set one; by execid gvkey notsorted; if first.gvkey then from = year; if last.gvkey then to = year; ceos = sum(ceos, ceo); cfos = sum(cfos, cfo); others = sum(others, other); end; ceo = ceos > 0; cfo = cfos > 0; other = others > 0; keep &vars; run; /* pair previous exec-co observation with current */ data three; set two; by execid notsorted; exceo = ifn(first.execid, ., lag(ceo)); excfo = ifn(first.execid, ., lag(cfo)); exother = ifn(first.execid, ., lag(other)); run; /* summarize */ data _null_; set three end=end; array count[1:9] _temporary_ (9*0); if exceo then do; if ceo then count[1] + 1; if cfo then count[2] + 1; if other then count[3] + 1; end; if excfo then do; if ceo then count[4] + 1; if cfo then count[5] + 1; if other then count[6] + 1; end; if exother then do; if ceo then count[7] + 1; if cfo then count[8] + 1; if other then count[9] + 1; end; if end then do; put @1 "ex" @10 "current"; put @1 " " @10 "CEO" @20 "CFO" @30 "Other"; put @1 "CEO" @10 count[1] @20 count[2] @30 count[3]; put @1 "CFO" @10 count[4] @20 count[5] @30 count[6]; put @1 "Other" @10 count[7] @20 count[8] @30 count[9]; end; run; /* on log ex current CEO CFO Other CEO 1 1 2 CFO 0 0 0 Other 2 1 3 */

Online Status	Offline
Date Last Visited	‎09-01-2015 07:11 AM

Re: How to create a macro date9 variable that's one day less than anot...

Re: Conditional Check if Macro variable value not resolved?

Re: Macro within data step

Re: Remove leading and trailing zeros from character field

Re: Remove leading and trailing zeros from character field

Re: Including a program within itself

Re: Repeat a macro

Re: Could you help to understand this DoW loop code?

Re: Is it ever important not to use a period behind a macro variable?

Re: Is it ever important not to use a period behind a macro variable?

Re: Datalines VS Cards.

Re: Macro within data step

Re: Extracting a part of the string

Re: Assigning a random numeric ID within a series of numbers

Re: Usage of macro variables in data step and table name

Re: Convert Long filename to Short (Dos) filename

Re: how to find the execution plan of proc sql?

Re: Remove leading and trailing zeros from character field

Re: SAS Macro Issue with Data values

Re: datepart subtracting time

Re: Strongly connected component in oriented graph.Procedures(tools) o...

Re: Datastepping - beyond my skills..

Re: Import a flat file of long string of names into SAS

Re: Please help me to get rid of the data step “data have; set have; ...

Re: %eval return an boolean value, how?

Simuation--please help

Re: X Windows Command - Cd to a folder directory with "&"

Re: Assign a variable name to a variable in the data step

Re: %scan has too many arguments

Re: Reg:Date in DD-MMM-YY

Unicode Output

Quoting program name & file path argument in filename pipe on Windows

Re: A simple math puzzle

Re: Giving one Unique No

resolution for missing lead function