09-01-2015
tesu
Calcite | Level 5
Member since
03-27-2012
- 40 Posts
- 1 Likes Given
- 0 Solutions
- 0 Likes Received
-
Latest posts by tesu
Subject Views Posted 897 12-02-2014 07:41 PM 897 12-02-2014 04:31 PM 2977 12-02-2014 03:41 PM 2977 12-02-2014 03:06 PM 3027 12-01-2014 09:17 PM 3027 11-30-2014 06:26 PM 3027 11-30-2014 05:07 PM 3374 11-30-2014 08:59 AM 3018 11-29-2014 11:22 PM 3208 11-29-2014 09:40 PM -
Activity Feed for tesu
- Posted Re: Not enough resources: Simulation and the calculation of pair-wise distances on SAS Programming. 12-02-2014 07:41 PM
- Posted Re: Not enough resources: Simulation and the calculation of pair-wise distances on SAS Programming. 12-02-2014 04:31 PM
- Posted Re: Not enough resources: Simulation and the calculation of pair-wise distances on SAS Programming. 12-02-2014 03:41 PM
- Posted Re: Not enough resources: Simulation and the calculation of pair-wise distances on SAS Programming. 12-02-2014 03:06 PM
- Posted Re: Not enough resources: Simulation and the calculation of pair-wise distances on SAS Programming. 12-01-2014 09:17 PM
- Posted Re: Not enough resources: Simulation and the calculation of pair-wise distances on SAS Programming. 11-30-2014 06:26 PM
- Posted Re: Not enough resources: Simulation and the calculation of pair-wise distances on SAS Programming. 11-30-2014 05:07 PM
- Posted Not enough resources: Simulation and the calculation of pair-wise distances on SAS Programming. 11-30-2014 08:59 AM
- Posted Re: Calculating pair-wise distances using the geodist function on SAS Programming. 11-29-2014 11:22 PM
- Posted Calculating pair-wise distances using the geodist function on SAS Programming. 11-29-2014 09:40 PM
- Posted Re: Reading a "complex" data file on SAS Programming. 05-01-2014 05:27 PM
- Posted Re: Reading a "complex" data file on SAS Programming. 04-30-2014 10:44 PM
- Posted Re: Reading a "complex" data file on SAS Programming. 04-28-2014 07:06 PM
- Posted Reading a "complex" data file on SAS Programming. 04-28-2014 01:13 AM
- Posted Re: Hard question: Selecting observation(s) conditionally by group on SAS Programming. 12-12-2013 03:35 PM
- Posted Hard question: Selecting observation(s) conditionally by group on SAS Programming. 12-12-2013 02:05 AM
- Posted Re: Sorting (ordering) variables of a data set on SAS Programming. 11-18-2013 07:02 PM
- Posted Sorting (ordering) variables of a data set on SAS Programming. 11-18-2013 05:17 PM
- Liked Re: OD matrix (long form to a matrix form) for PGStats. 08-17-2013 11:18 AM
- Posted Re: OD matrix (long form to a matrix form) on SAS Programming. 08-17-2013 11:17 AM
-
Posts I Liked
Subject Likes Author Latest Post 1
12-02-2014
07:41 PM
Patrick. The processing time was: PROCESSING TIME: 107 (mm:ss) It is a great improvement, but worth trying one more thing. Write a C code to calculate pair-wise distances because the distances calculation is the most time-consuming task. I can write it in C, but I don't know how to use the C in SAS. After I write the C code, I will revisit the SAS community and launch a separate question. Thank you very much for your help. I learned a lot from you. Good old times... William
... View more
12-02-2014
04:31 PM
Patrick, I want to suppress producing unnecessary log files. I erase the following lines from your original code: options fullstimer msglevel=i; %let DTstamp=%sysfunc(datetime(), B8601DT.); %let cmd=&cmd -log "&dir_path\%scan(&pgrm,1,.)_&DTstamp..log %let cmd=&cmd -print "&dir_path\%scan(&pgrm,1,.)_&DTstamp..lst status =S012345_&DTstamp mname =taskname And I added "options nonotes nosource nosource2 errors=0;"/ after put in hoping not to produce log files for each individual iteration. But I still see prog1.log, prog2.log, ..., prog1000.log log files in the folder that has the main SAS code. There could be a procedure to direct those log files to other directory. However, I don't need all the log files.
... View more
12-02-2014
03:41 PM
We are talking to each other real time right now. Yes. I did it. And it worked. Please look at my edited previous reply. I will get back to you with the total computation time for the 1,000 iterations.
... View more
12-02-2014
03:06 PM
EDIT: Your code works after I changed the %let sasbat statement to %let sasbat="C:\Program Files\SASHome\SASFoundation\9.3\sas.exe"; === [ Work history ] === Thank you very much again for the update, Patrick. Same as before, the code did not generate any output. I DID NOT run the second section of your SAS code that combines the kout datasets. I include the log message and how the working folder and the test library look like: I changed the working folder from c:\test to d:\patrick. I set the number of task iteration as two: %let rep=2; ======================================== NOTE: The file PRG is: Filename=d:\patrick\prog1.sas, RECFM=V,LRECL=256,File Size (bytes)=0, Last Modified=02Dec2014:15:03:10, Create Time=02Dec2014:14:42:40 NOTE: 37 records were written to the file PRG. The minimum record length was 0. The maximum record length was 100. NOTE: DATA statement used (Total process time): real time 0.00 seconds user cpu time 0.00 seconds system cpu time 0.00 seconds memory 277.93k OS Memory 13252.00k Timestamp 12/02/2014 03:03:10 PM NOTE: Task "task0" produced no LOG/Output. NOTE: DATA statement used (Total process time): real time 1.00 seconds user cpu time 0.01 seconds system cpu time 0.01 seconds memory 232.34k OS Memory 13252.00k Timestamp 12/02/2014 03:03:11 PM NOTE: The file PRG is: Filename=d:\patrick\prog2.sas, RECFM=V,LRECL=256,File Size (bytes)=0, Last Modified=02Dec2014:15:03:11, Create Time=02Dec2014:14:42:41 NOTE: 37 records were written to the file PRG. The minimum record length was 0. The maximum record length was 100. NOTE: DATA statement used (Total process time): real time 0.00 seconds user cpu time 0.00 seconds system cpu time 0.00 seconds memory 231.50k OS Memory 13252.00k Timestamp 12/02/2014 03:03:11 PM NOTE: Task "task1" produced no LOG/Output. NOTE: DATA statement used (Total process time): real time 1.00 seconds user cpu time 0.00 seconds system cpu time 0.00 seconds memory 232.34k OS Memory 13252.00k Timestamp 12/02/2014 03:03:12 PM ======================================= Something goes wrong in the batch mode processing? The main LOG says "NOTE: Task "task0" produced no LOG/Output." The batch processing is completely a new area to me, but I will study those material. I am googling cases that SAS batch processing produces no outputs. Thank you very much again for all of your responses. EDIT: I am reading this blog: SAS Programming for Data Mining: Parallel Processing with Single-Thread SAS license, which seems to be closely relevant to my problem. It seems that systask command operation is the product of SAS/CONNECT? My SAS license DOES contain that product. EDIT: From your comment in this thread: https://communities.sas.com/message/220782, I changed your %let sasbat statement to %let sasbat="C:\Program Files\SASHome\SASFoundation\9.3\sas.exe";. Now, it worked. I have the kout datasets in the test library, and in the d:\patrick.
... View more
12-01-2014
09:17 PM
Thank you very much, Patrick. The running time was around 18~19 minutes on my laptop (i3-3120M, 16GB DDR3 RAM, and a SSD). Great improvement. I will try hard to study the parallel processing that you provide, but for now I have just one question. After the job had been finished, I can't find any "kout&i" dataset, which is the output dataset from the KDE procedure. I need to collect those datasets to do something else. Each kout&i dataset has just one column the variable of which is density. I need to make a big matrix that has the 1000 density&i variables. -- Lots to learn in this world, and life is great! Miscellaneous: Why do we need two dots as in "c:\test\prog&i..sas" and two slashes as in "libname test 'c:\test';"// ?
... View more
11-30-2014
06:26 PM
Thanks for further guidance, Patrick. In order to use the "by" command in the KDE procedure, I believe I need to make one big "geodist" data set with an additional group variable, which goes from 1 to the number of rep=1000. I don't know how to modify the hash process for the big geodist data, which should have the (5500*(5500-1)/2)*1000 = 1.512e+10 observations. Here, (5500*(5500-1)/2) is the number of pair-wise distances for one geodist data set. I need to study the hash operation, which is new to me. The PROCESSING TIME for [%do i=1 %to 10;] ended up with 3:48 (mm:ss). The calculation of the pair-wise distances (15,122,250 observations) for one location sample takes up about 20 seconds. I believe I should write this tedious distance calculation job in C, and make a dll. Then, I may use it in the data step. Why I stick to SAs is because the KDE procedure is convenient. Loading a dll is a separate issue, and I may be visit again this community to ask further questions.
... View more
11-30-2014
05:07 PM
Thanks again, Patrick. The 4-hour running time is quite long to me. I should use some C routines to do the work. -- Your hash table solution is a killer!
... View more
11-30-2014
08:59 AM
This question is a follow-up from Patrick's answer on the pair-wise calculation: https://communities.sas.com/message/239952 My current working SAS code is the following: %let datetime_start = %sysfunc(TIME()) ; %put START TIME: %sysfunc(datetime(),datetime14.); data original (drop = i); do i = 1 to 1000000; lon = -78 + ranuni(i); lat = 37 + ranuni(i+1); output; end; run; proc surveyselect data = original method = SRS rep = 1000 sampsize = 5500 out = sampledata noprint; id _all_; run; %macro geodist; %do i=1 %to 1000; data loc&i; set sampledata; if replicate=&i; drop replicate; recid+1; run; data loc&i; retain recid; set loc&i; run; data geodist&i(drop=_:); set loc&i nobs=nobs; if _n_=1 then do; if 0 then set loc&i loc&i(keep=lat lon rename=(lat=hlat lon=hlon)); declare hash h1(dataset:"loc&i(keep=recid lat lon rename=(lat=hlat lon=hlon))"); _rc=h1.defineKey("recid"); _rc=h1.defineData("hlat","hlon"); _rc=h1.defineDone(); end; do _i= (recid+1) to nobs; _rc=h1.find(key:_i); geodist=geodist(lat, lon, hlat, hlon, "m"); output; end; run; proc kde data=geodist&i; univar geodist/method=srot out=kout&i plots=NONE NOPRINT; run; data kout&i; set kout&i; keep density; run; %end; %mend geodist; %geodist; %put END TIME: %sysfunc(datetime(),datetime14.); %put PROCESSING TIME: %sysfunc(putn(%sysevalf(%sysfunc(TIME())-&datetime_start.),mmss.)) (mm:ss) ; My task is (1) to create a date set with 1 million longitude-latitude observations, (2) to sample 5500 observations 1000 times (hence, 1000 data sets), (3) to calculate the pair-wise distances for every dataset, and (4) to draw kernel densities. This is a huge calculation. The SAS system crashed about at the 80-th iteration with complaining that it did not have sufficient resources. My question is whether or not the code can be modified to increase the calculation performance by minimize the iteration process. If there is no room for improvement, I may have to go down to a low-level programming, such as Fortran, C, and so on. Thank you very much!
... View more
11-29-2014
11:22 PM
Thanks Patrick. I have the following error message: 9 data want2(drop=_:); 10 set have nobs=nobs; 11 if _n_=1 then 12 do; 13 if 0 then set have have(keep=lat lon rename=(lat=hlat lon=hlon)); 14 dcl hash h1(dataset:'have(keep=recid lat lon rename=(lat=hlat lon=hlon))'); 15 _rc=h1.defineKey('recid'); 16 _rc=h1.defineData('hlat','hlon'); 17 _rc=h1.defineDone(); 18 end; 19 do _i= (recid+1) to nobs; 20 _rc=h1.find(key:_i); 21 geodist=geodist(lat, lon, hlat, hlon, 'm'); 22 output; 23 end; 24 run; NOTE: Variable recid is uninitialized. ERROR: The variable recid in the DROP, KEEP, or RENAME list has never been referenced. ERROR: Hash data set load failed at line 17 column 11. ERROR: DATA STEP Component Object failure. Aborted during the EXECUTION phase. NOTE: The SAS System stopped processing this step because of errors. NOTE: There were 1 observations read from the data set WORK.HAVE. WARNING: The data set WORK.WANT2 may be incomplete. When this step was stopped there were 0 observations and 6 variables. NOTE: DATA statement used (Total process time): real time 0.09 seconds cpu time 0.01 second EDIT: It worked after I added the "recid" variable: data have; input recid lon lat; cards; 1 -75.547513 39.757077 2 -75.554342 39.749864 3 -75.555394 39.730672 4 -75.556227 39.737546 ;run; Thank you very much!
... View more
11-29-2014
09:40 PM
Hi all, data _null_; d = geodist(39.757077, -75.547513, 39.749864, -75.554342, 'm'); put d; run; The code above calculates the distance between the two locations: 0.616476842 mile. data have; input lon lat; cards; -75.547513 39.757077 -75.554342 39.749864 -75.555394 39.730672 -75.556227 39.737546 ;run; Now, I have to calculate the distances between every possible pair using the "have" dataset. So, the number of distances is 4*3/2 = 6. (In fact, the "have" dataset has a lot of observations). Would you advise me on how to create a new dataset that has the 6 distance observations? Thank you very much.
... View more
04-30-2014
10:44 PM
Thank you very much, Tom. I learned a lot from your code! Actually, your code did not print MSA code for some large areas. But it is just because of a data problem. For example, Norfolk-Virginia Beach-Newport News, VA-NC MSA. --> Here, the whole name is in one line. But, New York-Northern New Jersey-Long Island, NY-NJ-CT-PA CMSA Philadelphia-Wilmington-Atlantic City, PA-NJ-DE-MD CMSA NY and Philadelphia each takes two lines. I erased the second lines and got it right in the end.
... View more
04-28-2014
07:06 PM
Thank you guys. I thought I was dumb enough to fail to read this data. But it turns out that in fact this task is not an easy one. I will try your suggestions and get back to you.
... View more
04-28-2014
01:13 AM
Hi all, I have a problem in reading the following data file in SAS. I have tried ... like for 2 hours but no luck. Any help will be appreciated. https://www.census.gov/population/metro/files/lists/historical/cencty.txt The data file and its layout are attached. Bill
... View more
12-12-2013
03:35 PM
Thank you, all. You guys are amazing!
... View more