12-17-2021
FredGIII
Quartz | Level 8
Member since
06-23-2011
- 58 Posts
- 7 Likes Given
- 0 Solutions
- 23 Likes Received
-
Latest posts by FredGIII
Subject Views Posted 1294 09-24-2019 01:41 PM 1340 09-24-2019 11:37 AM 1348 09-24-2019 11:06 AM 1604 09-18-2019 08:02 AM 2141 06-25-2019 11:27 AM 2166 06-25-2019 11:02 AM 5435 12-06-2018 12:45 PM 7650 10-21-2014 11:27 AM 7650 10-21-2014 10:30 AM 8130 10-21-2014 10:16 AM -
Activity Feed for FredGIII
- Got a Like for VA - Slider Controls object option. 09-07-2021 08:47 AM
- Got a Like for Overlay histograms - all data vs subset. 03-25-2020 08:36 PM
- Liked Re: ODS Graphics Designer Code - results not the same run in SAS Studio for DanH_sas. 09-24-2019 01:42 PM
- Posted Re: ODS Graphics Designer Code - results not the same run in SAS Studio on Graphics Programming. 09-24-2019 01:41 PM
- Posted Re: ODS Graphics Designer Code - results not the same run in SAS Studio on Graphics Programming. 09-24-2019 11:37 AM
- Posted ODS Graphics Designer Code - results not the same run in SAS Studio on Graphics Programming. 09-24-2019 11:06 AM
- Liked Re: Where are SAS Studio Snippets stored? for AnandVyas. 09-18-2019 10:02 AM
- Posted Where are SAS Studio Snippets stored? on SAS Studio. 09-18-2019 08:02 AM
- Tagged Where are SAS Studio Snippets stored? on SAS Studio. 09-18-2019 08:02 AM
- Tagged Where are SAS Studio Snippets stored? on SAS Studio. 09-18-2019 08:02 AM
- Posted Re: SGPLOT only using half the graph space (offsetmax default value) on Graphics Programming. 06-25-2019 11:27 AM
- Liked Re: SGPLOT only using half the graph space (offsetmax default value) for Rick_SAS. 06-25-2019 11:23 AM
- Posted SGPLOT only using half the graph space (offsetmax default value) on Graphics Programming. 06-25-2019 11:02 AM
- Posted Re: Run SAS code remotely on a Linux server on SAS Programming. 12-06-2018 12:45 PM
- Liked Re: VA - Date as parameter - Status changed to: Implemented for AnnaBrown. 05-22-2018 01:21 PM
- Got a Like for VA - Date as parameter. 08-31-2017 07:08 PM
- Got a Like for VA - Date as parameter. 03-20-2017 04:04 PM
- Got a Like for VA - Date as parameter. 03-01-2017 12:00 PM
- Got a Like for VA - Slider Object - set default values on open. 02-23-2017 10:32 AM
- Got a Like for VA - Date as parameter. 02-20-2017 06:48 AM
-
Posts I Liked
Subject Likes Author Latest Post 1 1 2 1 1 -
My Liked Posts
Subject Likes Posted 2 02-12-2016 08:18 AM 11 02-11-2016 08:18 AM 5 02-11-2016 07:59 AM 1 05-20-2013 09:31 AM 2 10-21-2014 10:16 AM
05-21-2013
01:22 PM
Oh by the way Ahmed, The NAS drive is dedicated to just my desktop with a dedicated gigabit ethernet, so I have the full bandwidth available. But I agree with you, long term, I am really looking forward to the Server SAS /SAN!
... View more
05-21-2013
01:18 PM
Ahmed, Chris, et. al.: Thanks for all your feedback. I feel a bit foolish. During my research it appeared to me that hash tables would have been best and I spent a lot of time trying to figure out that route. What I had not tried was just creating an index. I finally did that yesterday. Creating the index took about 12 hrs (long but at least not 3 days!). After the index was created, I was amazed at how fast subsetting and summarizing went. I did a trial of proc summary for one serial number by day after the index and it only took about 4 minutes!!!! So, now I have a macro to loop through the serial numbers and I have turned it loose . Lesson learned!
... View more
05-20-2013
05:11 PM
Ahmed, Thanks for the reply. Your assumptions are correct - I am running SAS on Win7 64bit, Quad core, 8GB ram and 24TB NAS drive. Based on your comments, it sounds like SPDE might not benefit me. I am in the process of indexing the dataset and I am hoping that will help speed up the subsetting process. In the mean time, I am having a hard drive shipped that contains the individual files for each turbine. We are also in the process of installing a Server based SAS installation, but that is probably a month from being complete. The good news... its a learning experience . FG
... View more
05-20-2013
04:08 PM
Gady, Thanks for the info. We'll look into SPDE and figure out how to set it up. That sounds promising. If you have links to specific documents explaining SPDE, it would be helpful . Otherwise, it will be google searching I go
... View more
05-20-2013
01:07 PM
HI Gady, It appears to me that the default for Memsize is 0 which allocates the maximum memory available. It is not clear to me what benefit changing the Memsize would have. I am not familiar with SPDE, but I have to assume that it is a separate product and I do not believe we have a license for it - so not a choice. As for subsetting the data into smaller datasets, I did try that, but it took 3 days to subset one serial number. With the number of unique serial numbers, at that rate it would take over 6 years to complete the file splits... talk about job security LOL. Fortunately, it appears that I can get a drive sent to me that contains the separate files by serial number. I'll have to import those as separate datasets and then use a macro to loop through the individual datasets and run proc means on each one separately. At this point, I don't see another realistic approach. If you have other ideas to try, I would love to try them though. Thanks, FG
... View more
05-20-2013
11:40 AM
Haikuo; I tried your approach and got an insufficient memory to execute data step program error. I liked your approach and learned a lot from your code. I was bummed when I got the message. I suspect I may have to go back to the data source and see if they can supply the data in segmented form. We are installing a server based SAS system and I probably could handle the file once that is done, but timing wise, that may be a bit late. So still trying to figure out a way to split it in a reasonable time frame. Thanks, FG
... View more
05-20-2013
09:31 AM
1 Like
Before I try summarizing, I did write a short program to try and split the dataset into individual datasets for each equipmentsernum. Here is the program - please feel free to critique and offer suggestions on better programming : %macro SplitFile; /* get list of unique serial numbs */ proc sql noprint; select equipmentsernum into :sernum separated by " " from unique_serial_number; quit; /* second list of sernum with sn_ for valid sas dataset name */ proc sql noprint; select "sn"||"_"||equipmentsernum into :tag separated by " " from unique_serial_number; quit; %let i = 1; %do %while(%scan(&sernum, &i) ne); data %scan(&tag, &i); set MyLargeDataset; where equipmentsernum = "%scan(&sernum, &i)"; run; %let i= %eval(&i+1); %end; %mend SplitFile; The program works, but I let it run all weekend and in 58hrs it was just finished with one serial number. Would it be beneficial to use hash tables in conjunction with this macro to speed up the file split? Or do you still think summarizing the data with a subset of variables would be faster? Other ideas? Thanks FG
... View more
05-17-2013
12:13 PM
ballardw: If I understand you correctly, are you suggesting that I do a proc summary on a subset of the 290 variables in the table - as an example below of just 5 sensors. But perhaps break up the 290 variables into groups of 10 or 20? : proc summary data=MyLargeDataset nway; class equipmentsernum Date flag; Var sensor1 sensor2 sensor3 sensor4 sensor5; output out = SummaryDataset (drop = _type_ _freq_) Sum = max = min = median = mean = std = Kurt = Skew = n = /autoname ; run;
... View more
05-17-2013
10:03 AM
Reeza, I am running proc freq as you suggested now. I imagine it will take a while to get through the data (if it gets through all of it). Will report back when something happens. FG
... View more
05-16-2013
06:15 PM
Tom, In this case, the DATE variable is just that, MMDDYYYY. It was created as a subset of a date/time stamp for this reason. And running the proc summary above - we ran out of memory. We do have more memory on order, but have to wait on purchase order approvals, go to sourcing, purchasing..etc So I was hoping to find a quicker way to subset the data. If can subset the data quickly into tables by equipmentsernum then SAS can handle that file size without too much problem. Thanks, FG
... View more
05-16-2013
05:07 PM
Astounding: I did mention the date values above (every 5 min, so 288 observations or date values per day). So the summary function would have to summarize the 288 obs for each day for each equipmentsernum. We did try running the following : proc summary data=MyLargeDataset nway; class equipmentsernum Date flag; Var _numeric_; output out = SummaryDataset (drop = _type_ _freq_) Sum = max = min = median = mean = std = Kurt = Skew = n = /autoname ; run; We ran out of memory after 30 hrs. By the way, the flag is either 1 or 0 (1= full speed, 0= partial speed). Would it be better to do a By equipmentsernum, Date, flag instead of NWAY? FG
... View more
05-16-2013
04:37 PM
Thanks Astounding (that sounds strange ), We did think about doing PROC SQL, and we did try a test doing a subset using proc sql for one specific equipsernum, that query alone took about 30 hrs (a bit over 1 day). There are almost 800 unique ids in equipsernum, which means it would take over 2 years to subset the entire dataset.. I was hoping for something a bit speedier LOL.The sensor data is stored at 5 min intervals which means there are 288 observations per day. Thanks, FG
... View more
05-16-2013
02:32 PM
EJ - the data is static and we just need to generate summaries to create smaller datasets that we can work with. As for saving the hash table, my only thought was that if I am going to loop through by id and subset then proc summary, that I would either need to save the hash table or regenerate it for each loop. It seems that regenerating it for each loop would be inefficient. But again, that is assuming my approach is valid . FG
... View more
05-16-2013
02:08 PM
Ok before someone slaps me for doing something stupid, I realized that defineData(all:'y') with that large of a dataset was crazy. So I have removed that and am currently running the following against the 3TB dataset just to see if I can create the hash table. But my questions still hold - is there a better approach? Can I save the hashtable? Looping through hash table to subset the data? Links to hash table tutorials? Thanks, F: data hash_results; set myLargeDataset; if (_n_ eq 1) then do; declare hash a(dataset:'myLargeDataset', multidata:'y'); a.defineKey('equipmentsernum'); a.defineData('equipmentsernum', 'Date'); a.defineDone(); end; run;
... View more
05-16-2013
01:31 PM
Does anyone have links to a good beginning tutorial on Hash tables? I have been googling and reading a lot, but I find that the papers, so far, are fairly specific and vague enough that I find it difficult to understand the overall structure. I have a dataset with 530 million observations and 250+ columns of sensor data (~ 3 TB). The powers that be want stats summaries on ALL of the columns (n, min, max, mean, stddev, skew, kurtosis, var) by equipment id by date. Being new to SAS, I did a lot of research and it appears that hash tables would be the best approach but there are several aspects to the programming that are not clear to me. My initial approach (and please direct me if there is a better approach) is to use the hash tables to subset the data by id (or id/date) and then proc summary on the subset. I tried running the hash subset and ran out of memory (Win7 8GB memory). data hash_results; set myLargeDataset; if (_n_ eq 1) then do; declare hash a(dataset:'myLargeDataset'); a.defineKey('equipmentsernum', 'Date'); a.defineData(all:'y'); a.defineDone(); end; equipmentsernum = '296737'; if(a.find() eq 0); run; This code works on a subset of myLargeDataset, but on the big set, it quickly runs out of memory. Some things I haven't figured out with hash tables are: 1) Can I save the resulting hash table to re-use outside of the data step? 2) Can I write a macro to loop through the hash? My thought was to use the hash table to subset myLargeDataset into a smaller table of just one serial number or id, the call proc summary to get stats for that unit, then loop through the next serial number...etc. Any hash tutorials or pointers would be greatly appreciated. Regards, Fred
... View more
- « Previous
- Next »