About saivenkat

saivenkat · ‎11-23-2018

issue was resolved by upgrading the db2 client in mac, where sas compute installed

saivenkat · ‎11-22-2018

Hi, The table CLAIMJOB_CCD has 59 columns, out of which only the column name CAUSALPARTNOPREFIXWID exceeding the length of 18 characters and other column exists with the name CAUSALPARTNOPREFIX so the access issue persists against this table. SQL pass through working fine but ignoring the column CAUSALPARTNOPREFIXWID with the warning below PROC SQL ; connect to db2 as db1 (database=xx USER=xxx PASSWORD="{SAS002}xxx" ); create table test as select * from connection to db1 (SELECT * FROM xxx.CLAIMJOB_CCD FETCH FIRST 3 ROWS ONLY); disconnect from db1; quit; WARNING: Variable CAUSALPARTNOPREFIX already exists on file WORK.TEST. NOTE: Table WORK.TEST created, with 3 rows and 58 columns. However, same table can be accessed directly from DB2 client environment(client(SAS EG/DI) mac and SAS Compute mac) without any issues. Other SAS environment(QA) not having any issues while accessing the same table, which pointed to same data source name so I suspect some problem with SAS environment(production) but not able to pin point where the problem is Thanks,

saivenkat · ‎11-22-2018

Hi Everyone, Error below throwing while accessing the db2 table ERROR: This DBMS table or view cannot be accessed by the SAS System because it contains column names that are not unique when a SAS normalized (uppercased) compare is performed. See "Naming Conventions" in the SAS/ACCESS documentation. Analysis: DB2 table has got 2 different columns having names as CAUSALPARTNOPREFIX, and CAUSALPARTNOPREFIXWID. The column CAUSALPARTNOPREFIXWID length exceeding 18 characters so SAS reading first 18 characters only. SAS reading the same column until last week without any issues, and we have the issue above beginning of 19th Nov 2018. We have 2 different environments qa and production. Above issue persists with production environment only, so I did compare the environments using the steps below %macro cmp(command); filename p pipe &command lrecl=32767; data _null_; infile p; input; put _infile_; run; %mend; option LS=256; %cmp("set"); And noticed one of the environment variables IBM_DB_HOME missing in production, added it to the sasuser and ensure it appear back in production but it didn't work tried to execute the below in both the environments for comparison but no success yet. options sastrace=',,d,d' sastraceloc=saslog nostsuffix; proc sql; describe table tstpro.claimjob_ccd; quit; ACCESS ENGINE: Exiting dbidsci with ERROR, rc=0x80121023 production environment showing the above error, but no details available on web to explore Any tips to debug further would be appreciated? Thanks,

saivenkat · ‎10-02-2018

Check the methods below, if it could be helpful in debugging proc metaoperate server="sasmetadataservername" port=8561 userid="id" password="pwd" action=status; run; recently we had fixed the access issue for Ad users using the links below, above method was helpful to ensure sas meta config changes were effective http://support.sas.com/kb/39/891.html http://support.sas.com/kb/51/911.html Did u manage to find fix?

saivenkat · ‎10-02-2018

Program below would be helpful in consolidating the variables that has got only missing values across the library proc sql noprint; create table tblscn as select memname, libname from dictionary.tables where libname='TMP'; quit; %macro loop; %local num i mv_memname mv_libname; %let dsid = %sysfunc(open(work.tblscn)); %let num = %sysfunc(attrn(&dsid,nlobs)); %let rc = %sysfunc(close(&dsid)); %do i=1 %to &num; data _null_; p = &i; set work.tblscn point=p; call symputx('mv_memname',memname); call symputx('mv_libname',libname); stop; run; data _null_; set &mv_libname..&mv_memname. end=last; array _a{*} $ _character_; array _b{*} _numeric_; length varname memname libname $32; if _n_ = 1 then do; call missing(var, varn, varname, memname, libname); declare hash nmc (ordered:"a"); nmc.definekey("var"); nmc.definedata('var','varname','memname','libname'); nmc.definedone(); declare hiter iter('nmc'); do var=1 to dim(_a); nmc.add(key:var,data:var,data:vname(_a[var]),data:"&memname.",data:"&libname."); put _all_; end; declare hash nmn (ordered:"a"); nmn.definekey("varn"); nmn.definedata('varn','varname','memname','libname'); nmn.definedone(); declare hiter niter('nmn'); do varn=1 to dim(_b); nmn.add(key:varn,data:varn,data:vname(_b[varn]),data:"&memname.",data:"&libname."); put _all_; end; end; rc=iter.first(); do while (rc = 0); if _a[var] ne '' then do; _x=var; rc=iter.next(); nmc.remove(key:_x); end; else rc=iter.next(); end; rc=niter.first(); do while (rc = 0); if _b[varn] ne . then do; _y=varn; rc=niter.next(); nmn.remove(key:_y); end; else rc=niter.next(); end; if last then do; nmc.output(dataset:'mc_ds'); nmn.output(dataset:'mcn_ds'); end; run; proc append base=mc_ds data=mcn_ds(rename=varn=var) force; run; proc append base=all_missing_var data=mc_ds force; run; %end; %mend loop; %loop;

saivenkat · ‎10-01-2018

Partitioning technique is reducing the elapsed time by 50 to 75% (60 mins reduced to 15 mins, another place 80 mins reduced to 25 mins). Out of which, most of the time getting consumed for overhead processes like below... 1) preparing the source data(creating formats, sort the source data, and determining the partition size(when partitions based on business keys) etc.,) 2) consolidate the partitioned target data into a single dataset 3) cleanup the intermediate tables We have 1 processor chip and other CPU resources are listed below... x cpuinfo: GenuineIntel Intel(R) Xeon(R) CPU E5-4655 v3 @ 2.90GHz x x cpuinfo: Hz=3126.335 bogomips=5808.16 x x cpuinfo: ProcessorChips=1 PhyscalCores=6 x x cpuinfo: Hyperthreads =2 VirtualCPUs =48 x x # of CPUs: 48 Can you highlight the risks of load failures that may expect in future if the CPU utilization goes beyond 95%? I am more interested in knowing if making the CPU utilization upto 100% during off peak hours (for batch loads), when it is the only process getting executed. Can it be considered as optimal way of utilizing the resources? I had seen issues with work tables, and views(applied fix as replacing these with permanent tables). SAS had difficulties in closing the sessions when it had crated work tables for processing as highlighted in my previous post. I had executed it several times and ensure no errors thrown. I wanted to be ready with fixes in advance, if any issue can be expected in future

saivenkat · ‎09-30-2018

Thanks you very much for detailed response! I would like to know few more insights about memory and cpu utilization. If I allow the sas programs to process couple of billion records, free memory varies between 0 to 1% till the process gets completed but free CPU wouldn't go low beyond 95%. I applied partitioning technique for the same amount of data using the dataset options firstobs and obs, and allowed all the partitions to trigger concurrently.. I could see the jobs getting completed without any issue and noticed both memory and cpu free percent is less than 1% until the process completed. Can I consider, making the cpu utilisation beyong 95% is optimal way of processing the data? Do you foresee any risks of load failures in such conditions? I had to ensure the conditions below to make partitioning technique successful 1) No temporary dataset allowed in the SAS program 2) No views exist in the SAS program 3) Distributed the partitions uniformly either by business key or total obs Thanks again!

saivenkat · ‎09-30-2018

data want(drop=lID lCatg_1 lCatg_2); set have; by id; lID=lag(ID); lCatg_1=lag(Catg_1); lCatg_2=lag(Catg_2); if lid eq id then do; if lCatg_1 eq Catg_1 and lCatg_2 eq Catg_2 then delete; if lCatg_1 ne Catg_1 and lCatg_2 eq Catg_2 then Catg_2=''; if lCatg_1 eq Catg_1 and lCatg_2 ne Catg_2 then Catg_1=''; if Catg_1 eq '' and Catg_2 eq '' then delete; /* if lCatg_2 eq Catg_2 then delete; */ end; if not (first.id and last.id) and Catg_1 eq '' and Catg_2 eq '' then delete;

saivenkat · ‎09-28-2018

I wanted to compare the IO Rate between "performance of dd or cp command Linux" and "SAS procedures or data step" when I have executed the step below it consumed 18 mins to process datasets have_a and have_b. where each datasite sizes upto 195 gb so can I consider SAS IO rate here is 368 mbps. It looks optimal when compare to processing in Linux, where SAS compute got installed.. data work.censusmerge; set have_a have_b; by key; run; NOTE: There were 2176165646 observations read from the data set have_a. NOTE: There were 2176165646 observations read from the data set have_a. NOTE: The data set WORK.CENSUSMERGE has 4352331292 observations and 18 variables. NOTE: DATA statement used (Total process time): real time 18:05.02 user cpu time 9:59.07 system cpu time 8:05.97 memory 24984.06k OS Memory 45516.00k Timestamp 09/28/2018 09:27:49 PM Step Count 6 Switch Count 1492 Page Faults 0 Page Reclaims 102891 Page Swaps 0 Voluntary Context Switches 5048 Involuntary Context Switches 1313 Block Input Operations 785244120 Block Output Operations 817162272

saivenkat · ‎09-28-2018

Hello Everyone! When I have copied a sas file in linux using the command dd if=have.sas7bdat of=test.sas7bdat to know the io throughput rate of environment and noticed speed varies from 250 mbps to 400 mbps. 209194057728 bytes (209 GB) copied, 834.332 s, 251 MB/s Whereas while I try to sort the data file using proc sort data=have; by key; run; it sorted the data in 621 seconds. Table sizes upto 195 gb so shall I consider SAS io rate as 321 mbps. Am I measuring the performance right way? Can you show some light in measuring the sas performance right way in standard manner like using above dd command in linux environment? Thanks in advance!

saivenkat · ‎09-27-2018

data _null_; set have end=last; array _a{*} $ _character_; length name $12; if _n_ = 1 then do; call missing(idx, name); declare hash miss (ordered:"a"); miss.definekey("idx"); miss.definedata('idx','name'); miss.definedone(); declare hiter iter('miss'); do idx=1 to dim(_a); miss.add(key:idx,data:idx,data:vname(_a[idx])); end; end; rc=iter.first(); do while (rc = 0); if _a[idx] ne '?' and _a[idx] ne '' then do; _x=idx; rc=iter.next(); miss.remove(key:_x); end; else rc=iter.next(); end; if last then miss.output(dataset:'drop'); run; I wanted to drop the columns _a1.._a3 from the sample dataset "have".. since the columns _a1.._a3 has got the values as "x" across the table.. changing the line as if _a[idx] ne 'x' then do; worked fine.. and processing time saved more than 50%. Master piece... Thank you very much!

saivenkat · ‎09-26-2018

should get rid of the variable if it has only missing values or specific string like '?' across the table

saivenkat · ‎09-26-2018

didn't work.. can you revisit the steps? processing time reduced from 16 mins to 4 mins but output not listed the columns that has only string ?

saivenkat · ‎09-26-2018

Hello Everyone, I see many posts around this concept, but I have unique case as I have 6000 variables approximately in a table and it has 2 million records sizing up to 100 GB. I wanted to drop the columns, if a column has either missing or a single value like '?' I have been using the following program, which isn't quite optimal. I have been trying to prevent accumulate these columns as an alternative. Can you suggest the better options, if you think of any? 1000+ columns exist with either missing or ? out of 6000 columns data _null_; set have end=last; if _n_ = 1 then do; declare hash cnt(); rc = cnt.definekey('nmc'); rc = cnt.definedata('nmc'); rc = cnt.definedone(); end; array _a{*} $ _character_; do i=1 to dim(_a); if _a{i} ne '?' and _a{i} ne '' then do; nmc=vname(_a{i}); rc = cnt.find(); if rc then do; cnt.add(); end; if last then rc = cnt.output(dataset: "nmc_bo"); end; end; run; proc sql noprint; select name into :mv_mc separated by ',' from dictionary.columns where libname='WORK' and memname='HAVE' and type='char' and name not in (select nmc from nmc_bo); quit; %put Note: Columns that will be discarded are &mv_mc; proc sql noprint; alter table have drop &mv_mc.; quit; proc datasets lib=work nolist; delete nmc_bo; quit;

saivenkat · ‎09-24-2018

I have 2 different tables and both the tables have couple of billion records.. It is required to fetch the records from table1, if the record doesn't exist in table2. I felt it is best to find the missing values in the range that belongs to table2. Fianlly fetch the records from table1 if it matches with missing values or beyond the range. Thanks for your prompt response!

Online Status	Offline
Date Last Visited	‎11-23-2018 07:17 AM

Re: Accessing db2 column that has name more than 18 characters

Re: Accessing db2 column that has name more than 18 characters

Accessing db2 column that has name more than 18 characters

Re: How does a Linux batch job connect to metadata?

Re: Loop through datasets and variables in source file

Re: Performance measurement

Re: Performance measurement

Re: Dealing with repeated observations for the same ID

Re: Performance measurement

Performance measurement

Re: Drop variables contain specific string or missing values

Re: Drop variables contain specific string or missing values

Re: Drop variables contain specific string or missing values

Re: Drop variables contain specific string or missing values

Re: Optimization

Re: Accessing db2 column that has name more than 18 characters

Re: Dealing with repeated observations for the same ID

Re: Accessing db2 column that has name more than 18 characters

Re: Accessing db2 column that has name more than 18 characters

Accessing db2 column that has name more than 18 characters

Re: How does a Linux batch job connect to metadata?

Re: Loop through datasets and variables in source file

Re: Performance measurement

Re: Performance measurement

Re: Dealing with repeated observations for the same ID

Re: Performance measurement

Performance measurement

Re: Drop variables contain specific string or missing values

Re: Drop variables contain specific string or missing values

Re: Drop variables contain specific string or missing values

Drop variables contain specific string or missing values

Re: Optimization