About Junyong

Junyong · ‎02-06-2021

Re: Need alternative of "Filename" statement with Pipe operator for SAS University Edition and Re: Bulk Download all my files from SAS Ondemand introduce the way to read filenames recursively via modify , replace , and dread without pipe . data i; length i j $800; i='!USERPROFILE\Desktop'; run; data i; modify i; a=filename("i",catx("\",i,j)); a=dopen("i"); if a then do; i=catx("\",i,j); j=""; end; replace; if a; do k=1 to dnum(a); j=dread(a,k); output; end; a=dclose(a); a=filename("i"); run; Usually, I use neither modify nor replace , but why can the code above read filenames recursively while the code below cannot? data i; length i j $800; i='!USERPROFILE\Desktop'; a=filename("i",catx("\",i,j)); a=dopen("i"); if a then do; i=catx("\",i,j); j=""; end; if a; do k=1 to dnum(a); j=dread(a,k); output; end; a=dclose(a); a=filename("i"); run; I wonder whether the two separate data steps are necessary.

Junyong · ‎02-05-2021

Thanks—as mentioned, How to: List all the files in a folder and also Bulk Download all my files from SAS Ondemand answer this question.

Junyong · ‎02-05-2021

At SAS OnDemand, I have multiple directories with multiple files, respectively, and want to download not one file but one directory. (1) Neither right-clicking and downloading the directory nor clicking the download button with the directory selected is possible. (2) I tried to zip all the files in the directory by reading the list via pipe , but pipe in SAS OnDemand is impossible. Should I download each file at a time from SAS OnDemand?

Junyong · ‎01-18-2021

%SYSLPUT is submitted in a (most likely local) session and creates/copies macro variables to the server session—here %SYSLPUT passes either local or global macro variables from the local session to the server session only as global macro variables. %macro have(local_var_at_local); %syslput local_var_at_server=&local_var_at_local; rsubmit; %put &local_var_at_server is a local macro variable at the server session. endrsubmit; %mend; %have rsubmit; %put &local_var_at_server is available though was local inside have macro; endrsubmit; I want to create a server-side local (rather than global) macro variable from a local-side local macro variable.

Junyong · ‎01-18-2021

%syslput and %sysrput assign a global macro variable as a result. /*remote access*/ %let wrds=wrds.wharton.upenn.edu 4016; signon wrds username=_prompt_; /*pass local macro ONE to remote*/ %macro one(one); %syslput one=&one; rsubmit; %put &one; endrsubmit; %mend; /*prints 1234 remotely*/ %one(1234) /*ONE works globally*/ rsubmit; %sysrput one=&one; endrsubmit; /*prints 1234 here*/ %put &one; Can I assign a local macro variable in this case?

Junyong · ‎01-17-2021

My apologies—in short, I wonder if there is an important performance gain or loss in the following four different approaches (ceteris paribus). data i1; set i; where i1>0 & i2>0; keep i1-i1000; run; data i2; set i(where=(i1>0 & i2>0)); keep i1-i1000; run; data i3; set i(keep=i1-i1000); where i1>0 & i2>0; run; data i4; set i(keep=i1-i1000 where=(i1>0 & i2>0)); run; It seems I1 rewrites all the 5,000 variables before KEEPing while I3 doesn't, but I3 takes two steps while I1 does only one.

Junyong · ‎01-17-2021

For a SAS data set like data i; array i(5000) i1-i5000; do j=1 to 5000; do k=1 to 5000; i(k)=rannor(1); end; output; end; drop j k; run; one may be able to subset via KEEP and WHERE as follows. data i1; set i; where i1>0 & i2>0; keep i1-i1000; run; The DATA above will apply WHERE first but read and write all 5,000 variables before applying KEEP. There may be three more versions to do this as follows. data i2; set i(where=(i1>0 & i2>0)); keep i1-i1000; run; data i3; set i(keep=i1-i1000); where i1>0 & i2>0; run; data i4; set i(keep=i1-i1000 where=(i1>0 & i2>0)); run; 1. I thought the second version will be less efficient than the first version because it implicitly takes one more step before the explicit DATA, so I have tried to avoid this unless I need separate WHEREs for multiple sets (for example, before merging multiple data sets). 2. The third version introduces one more implicit step but doesn't read and write all the 5,000 variables, so I think there will be a trade-off but am not sure. 3. The fourth version, like the third version, applies KEEP first and then WHERE but due to a different reason. Or one can also consider PROC SQL unless the 1,000 variables above. proc sql; create table i5 as select i1,i2,i3,i4,i5,i6,i7,i8,i9,i10 from i where i1>0 & i2>0; quit; In this case, PROC SQL may only consider the 10 variables stated so will be useful unless sequential access (such as LAG) is required. Many documents say IF and WHERE are different, but I am not sure whether there is an important performance difference between using parentheses or not (if there is nothing more to be considered). It seems there is a trade-off between introducing one more implicit step and not rewriting all the unnecessary variables (or the performance may be only marginally different, as the following log shows). 1 data i; 2 array i(5000) i1-i5000; 3 do j=1 to 5000; 4 do k=1 to 5000; 5 i(k)=rannor(1); 6 end; 7 output; 8 end; 9 drop j k; 10 run; NOTE: The data set WORK.I has 5000 observations and 5000 variables. NOTE: DATA statement used (Total process time): real time 1.70 seconds cpu time 1.68 seconds 11 12 data i1; 13 set i; 14 where i1>0 & i2>0; 15 keep i1-i1000; 16 run; NOTE: There were 1207 observations read from the data set WORK.I. WHERE (i1>0) and (i2>0); NOTE: The data set WORK.I1 has 1207 observations and 1000 variables. NOTE: DATA statement used (Total process time): real time 0.17 seconds cpu time 0.15 seconds 17 18 data i2; 19 set i(where=(i1>0 & i2>0)); 20 keep i1-i1000; 21 run; NOTE: There were 1207 observations read from the data set WORK.I. WHERE (i1>0) and (i2>0); NOTE: The data set WORK.I2 has 1207 observations and 1000 variables. NOTE: DATA statement used (Total process time): real time 0.17 seconds cpu time 0.15 seconds 22 23 data i3; 24 set i(keep=i1-i1000); 25 where i1>0 & i2>0; 26 run; NOTE: There were 1207 observations read from the data set WORK.I. WHERE (i1>0) and (i2>0); NOTE: The data set WORK.I3 has 1207 observations and 1000 variables. NOTE: DATA statement used (Total process time): real time 0.17 seconds cpu time 0.17 seconds 27 28 data i4; 29 set i(keep=i1-i1000 where=(i1>0 & i2>0)); 30 run; NOTE: There were 1207 observations read from the data set WORK.I. WHERE (i1>0) and (i2>0); NOTE: The data set WORK.I4 has 1207 observations and 1000 variables. NOTE: DATA statement used (Total process time): real time 0.15 seconds cpu time 0.15 seconds 31 proc sql; 32 create table i5 as 33 select i1,i2,i3,i4,i5,i6,i7,i8,i9,i10 34 from i where i1>0 & i2>0; NOTE: Table WORK.I5 created, with 1207 rows and 10 columns. 35 quit; NOTE: PROCEDURE SQL used (Total process time): real time 0.17 seconds cpu time 0.15 seconds Thanks for all your help.

Junyong · ‎01-14-2021

Thanks for this intuitive example. I had absolutely no idea on this issue so read the IEEE 754 page above. I will stick to it unless integers.

Junyong · ‎01-14-2021

Thanks for this clarification again—it seems changing the default is ill-suited if non-integers. However, it seems it still pays to cut the data size because IT services such as Dropbox and Spectrum are not frictionless. I think compress works well for characters rather than numbers.

Junyong · ‎01-14-2021

Thanks—in case, my code above imports the full data via url .

Junyong · ‎01-14-2021

I had absolutely no idea on this issue so visited IEEE 754—it seems deviating from the default 8 is unreasonable unless integers like dates, years, dummies, etc. Much appreciate all these details for that novice question.

Junyong · ‎01-14-2021

I have a data set, the first 10 rows of which look like the following. year,R_F,R_MKT,R_ME,R_IA,R_ROE,R_EG 1967,4.1474,24.4192,40.5479,-11.4478,20.6095,-3.2998 1968,5.2942,8.8747,24.9021,14.7436,-2.4844,11.4650 1969,6.5912,-17.4274,-11.7458,0.4645,15.4144,12.9056 1970,6.3829,-6.3099,-7.9029,22.7755,-0.4696,17.1111 1971,4.3172,11.8817,5.3575,0.9003,11.4332,6.1606 1972,3.8912,13.4494,-9.0317,5.1487,5.6877,15.0977 1973,7.0586,-25.8075,-17.1956,7.7738,0.9487,17.2879 1974,8.0781,-36.0193,4.5604,18.5330,11.4993,20.1174 1975,5.8210,31.5368,16.9765,7.4978,-6.2017,11.5153 So, each value has at most four numbers after the decimal point. There is no explicit bound, but the values will effectively be in between -999 and 999. I ran the following code. data want; infile "http://global-q.org/uploads/1/2/2/6/122679606/ q5_factors_annual_2019a.csv" url firstobs=2 dsd; length year 3 R_F R_MKT R_ME R_IA R_ROE R_EG 6; input year R_F R_MKT R_ME R_IA R_ROE R_EG; run; I used 6 for the variable length based on this document, but it seems some values are unusually read as follows. I wonder whether (1) the unusual values such as the 5.2941999999 above are just OK, and (2) the default length 8 rather than the 6 above must be used for these four-digit values.

Junyong · ‎01-07-2021

I think this explains because the original code also works after adding parentheses before and after the two SELECT clauses as follows. proc sql; create table i as select i.ticker,j.date,adjust from ((select distinct ticker from yahoo) i, (select distinct date from yahoo) j) left join yahoo k on i.ticker=k.ticker & j.date=k.date order by ticker,date; quit; It seems PROC SQL tries to process JOIN first and then the comma unless the parentheses. Thanks.

Junyong · ‎01-06-2021

I have one unbalanced panel data set and am using SQL to make it balanced by (1) taking the Cartesian and then (2) left joining the original set. The following data set has TICKER and DATE as two indices. I Cartesian joined distinct TICKER and distinct DATE as follows and then tried to left join ADJUST to the Cartesian. data yahoo; input ticker $ @@; i=cats("https://query1.finance.yahoo.com/v7/finance/download/",ticker, '?period1=-999999999999&period2=999999999999&interval=1d'); infile j url filevar=i firstobs=2 dsd end=k; do until(k); input date yymmdd10. +1 open high low close adjust volume; output; end; cards; BA DIS KO ; proc sql; create table i as select i.ticker,j.date,adjust from (select distinct ticker from yahoo) i, (select distinct date from yahoo) j left join yahoo k on i.ticker=k.ticker & j.date=k.date order by ticker,date; quit; And the code above prints a correlated reference error message as follows. 14 proc sql; 15 create table i as 16 select i.ticker,j.date,adjust 17 from (select distinct ticker from yahoo) i, 18 (select distinct date from yahoo) j 19 left join yahoo k on i.ticker=k.ticker & j.date=k.date 20 order by ticker,date; ERROR: Correlated reference to column ticker is not contained within a subquery. 21 quit; NOTE: The SAS System stopped processing this step because of errors. NOTE: PROCEDURE SQL used (Total process time): real time 0.01 seconds cpu time 0.00 seconds I reviewed this note but couldn't understand because my ON clause for LEFT JOIN has all the variables already introduced: I.TICKER, K.TICKER, J.DATE, and K.DATE. I wonder whether this can be done in one SQL query. P.S. CROSS JOIN rather than just comma as follows works. proc sql; create table i as select i.ticker,j.date,adjust from (select distinct ticker from yahoo) i cross join (select distinct date from yahoo) j left join yahoo k on i.ticker=k.ticker & j.date=k.date order by ticker,date; quit; I wonder whether a comma and CROSS JOIN differ in other cases.

Junyong · ‎12-29-2020

I also though "What if 3 rather than 4?" just for a moment and realized 8,192 is inappropriate, but thanks again.

Online Status	Offline
Date Last Visited	‎02-18-2025 03:14 PM

How to Capture Part of Log as Macro Variable?

How to Escape Line Break in Long Code Line?

Re: How to Prevent Resolution of Ampersand?

How to Prevent Resolution of Ampersand?

How to Italicize Just One Word in FOOTNOTE?

Re: Applying Arrow Tips to SGPLOT Lines and Axes

Applying Arrow Tips to SGPLOT Lines and Axes

Displaying Values for Histograms

SGPLOT VBAR XAXIS Label Interval?

Reading Tab-Delimited Data with Spaces

Re: In VIEWTABLE, How Can I Directly Go to Certain Observation?

In VIEWTABLE, How Can I Directly Go to Certain Observation?

DO Loop and INFILE FILEVAR Together

Re: How to Download a Folder from SAS OnDemand?

Re: Skipping Invalid Lines

How Can MODIFY and REPLACE Read Filenames Recursively?

Re: How to Download a Folder from SAS OnDemand?

How to Download a Folder from SAS OnDemand?

Re: How to %SYSLPUT or %SYSRPUT Locally?

How to %SYSLPUT or %SYSRPUT Locally?

Re: DATA Step: Subsetting Inside Versus Outside Parentheses

DATA Step: Subsetting Inside Versus Outside Parentheses

Re: Best Variable Length for Numbers with Four Digits After Decimal Po...

Re: Best Variable Length for Numbers with Four Digits After Decimal Po...

Re: Best Variable Length for Numbers with Four Digits After Decimal Po...

Re: Best Variable Length for Numbers with Four Digits After Decimal Po...

Best Variable Length for Numbers with Four Digits After Decimal Point

Re: SQL Cartesian and Then Left Join: Correlated Reference Error?

SQL Cartesian and Then Left Join: Correlated Reference Error?

Re: How to Minimize Data?