About abhityagi

abhityagi · ‎01-08-2020

Thanks Reeza.

Reeza · ‎01-06-2020

Try PROC UNIVARIATE There are examples for most of the things you're asking for here: https://documentation.sas.com/?docsetId=procstat&docsetVersion=9.4&docsetTarget=procstat_univariate_examples.htm&locale=en @abhityagi wrote: Hi Team, How to achieve given below request in sas. * Mean value and Range of values for numeric fields • Five lowest and five highest values. * How to find outliers in SAS column. Thanks!

Kurt_Bremser · ‎12-13-2019

If you have 178 columns in the by statement, all 178 have to be identical between observations to constitute a dupiicate.

ScottBass · ‎12-10-2019

@abhityagi wrote: For Example: I have data set Dataset1: Columns: A B C D E F; Dataset2: Columns: A B; Dataset 3: Columns: C D; Dataset $: Columns: E F; Here I want to data compare dataset1 with dataset2, 3 and dataset4. PROC COMPARE will handle this: data base comp1 (drop=name age) comp2 (drop=age sex) comp3 (drop=height weight) comp4 (drop=age sex weight) ; set sashelp.class; run; %macro comp(base=,comp=); proc compare base=&base comp=&comp; run; %mend; %comp(base=base,comp=comp1); %comp(base=base,comp=comp2); %comp(base=base,comp=comp3); %comp(base=base,comp=comp4); If this does not summarize your data requirements please provide more explanation. Otherwise why reinvent the wheel?

hashman · ‎11-22-2019

@Kurt_Bremser: I'm not actually surprised. 1. As far as SORT goes, it may be a "reasonable number" but it's nowhere near the area where SORT starts reaching its usual bottlenecks. With 5 short variables, SORT will likely outperform anything else well into the 10 million obs and then some. And I don't think it will even have to make use of multi-threading because with the composite key cardinality of 15 it just keeps all its internal bookkeeping in memory. In fact, it will try making use of memory all the way up to SORTSIZE for its bookkeeping, and only after a certain threshold has been reached the limit is reached, it will start offloading parts of it to utility files in the SORTWORK space. That's why nowadays, when memories are huge even on small hardware, you see SORT grabbing so much RAM in order to speed things up. This is in sharp contrast with the days of yore, when the memories were scant (just 20 years ago on the mainframe it was normal to see REGION set to 256K), and the biggest advantage of sorting (and the reason why so much effort had been dedicated to optimizing its algorithms) was that in the end, it could get the job done using very little memory (albeit at the expense of processing time and using bookkeeping on disk and even tapes), while a fast lookup table approach that would require keeping all distinct keys in memory would fail due to the latter's paucity. In my real-world experience, SORT used for deduping starts sputtering somewhere around 50 GB and more. At one org I consulted for, there were up to 10 extracts, between 50 to 100 GB each, from SQL Server that needed to be deduped as the first ETL cleansing step. It was taking SORT 1.5 to 2 hours of real time apiece, not to mention constant nagging problems with the SORTWORK space (on Linux), and as a result, the ETL would simply fail to fit into the off-peak time window during which it had to run to make the data available in the morning. The problem was solved by keeping the keys only plus the RID and sort-deduping only those; then using the RIDs to mark the duplicate records in the original extracts for deletion. Since it resulted in reducing each dedup process to no more than 3 minutes, it enthused me to described the method here: https://support.sas.com/content/dam/SAS/support/en/sas-global-forum-proceedings/2018/2426-2018.pdf In the present OP's case, when all the variables form the composite dedup key, the concept can be still applied by combining it with using MD5 to create a single $16 key variable to reduce the sorting workload. Only in this case the number of dupes greatly exceeds the number of the records to keep, so it's better to rewrite HAVE into WANT by using the RIDs corresponding to the distinct records rather than using the RIDs corresponding to the dupes to mark the records in HAVE for deletion. For example: data have ; do _n_ = 1 to 1.5 * n ; _iorc_ = ceil (ranuni (1) * n) ; set sashelp.class nobs = n point = _iorc_ ; output ; end ; stop ; run ; data v (keep = ___:) / view = v ; set have ; array nn _numeric_ ; array cc _char_ ; ___m = put (md5 (catx (":", of nn[*], of cc[*])), $16.) ; ___rid = _n_ ; run ; proc sort nodupkey data = v out = nodup (keep = ___rid) ; by ___m ; run ; data want ; if _n_ = 1 then do ; dcl hash h (dataset:"nodup") ; h.definekey (all:"y") ; h.definedone () ; end ; set have curobs = ___rid ; if h.check() = 0 ; run ; 2. The hand-coded hash outperforms the hash object in this case because the array hash table is extremely sparse: 760007 slots for 15 distinct keys, so most likely there are no collisions at all, and the DO loop terminates after the first iteration. It means that effectively, it is as fast as key-indexing, except for the pretty high cost of computing the hash function (though it's mitigated by the fact that MD5, the costliest part if it, has to be computed, anyway). Also, array-based hashing always outperforms the hash object by a healthy margin, both in terms of time and memory footprint, if the key is a single integer variable - or a combination of keys that can be mapped to a single integer one-to-one very inexpensively. This is because in this case, the hash function is nothing but MOD, so it's very fast, which makes the entire algorithm very fast as well. Memory-wise, the shortest hash object item, even if there's a single numeric key, is 48 bytes, while one array item for such a key is 8 bytes. Therefore, even if we make the array 3 times sparser vis-a-vis the number of keys stored there, its memory footprint is still 1/2 of the hash object table. Furthermore, such an open-addressed hash table is faster than the hash object for data aggregation, and the reason is simple: With the former, we can add something directly to the array item, while with the latter, we first need to copy the value from a hash into the PDV, add something to it, and then replace it in the item with the new value. Thanks for the nice discussion. Kind regards Paul D.

Tom · ‎09-09-2019

I don't think archive is an internal command for any Unix shell that I know of. Perhaps the search path for the Unix session that is running your SAS session does not include the folder where archive command lives? What does the archive command do? Does it need to interact with a terminal?

PGStats · ‎12-27-2018

You should get that with : data want; set abhi; array _a a--m; do _i = 1 to dim(_a); var = vname(_a{_i}); value = _a{_i}; if not missing(value) then output; end; keep var value; run; proc sort data=want; by var; run; (untested)

Jagadishkatam · ‎12-25-2018

You could try something as below, here I chose the sashelp.class dataset and wanted to check its variable type whether it is character or numeric, if the attribute of the variable is against my expectation then I used the if then condition to send an email. we need to do at the bigger scale for all the variables you want to test. but hope this code will help you somewhat. %macro test; data _null_; set sashelp.class; call symput('typ',vtype(name)); run; %put &typ; %if &typ ne C %then %do; filename x email to='xxxx@gmail.com'; data _null_; file x; put 'Hi'; run; %end; %mend; %test;

Reeza · ‎12-18-2018

You need to provide a lot more details of your actual problem. Please post a small sample data set and expected output, which would help a lot more. For example, how do you know what values should exist? To me if I find Jan, Feb, March, I would say March is not correct because its more than 3 letters? How do you know you need 4 rows?

LinusH · ‎11-02-2018

#1. Commercial or Open source -- does SAS DI open source available ? Di Studio is clearly not open source. Is that what you are asking? #2. History --> what is the history of SAS DI ? It's been around for +10 years. What kind of story do you want? #3. Pricing --> what is the basic cost of SAS DI ? Check with your SAS representative. #4. Popularty --> How popular SAS DI as compare to other tool ? how it is better as copare to other tools ? Are you serious? Are you actually thinking that someone can give a objective answer to this? How do you even measure? #5. Platform --> In which platform SAS DI works ? The cleint (the least important part of the architecture) is installed on Windows client/terminal server. It uses SAS9 Intelligence Platform as server, so Windows; Linux, UNIX...and it can access and process data on literally any platform/database. #6. Custom Codes --> How to write customs codes in sas ? Manly using SAS (SQL, data step, SAS macro, SASprocedures) #7. Learning --> How easy to learn sas as compare to other ETL tool such as talend ? No clue. #8. Deployment --> How's deployment process of SAS DI. Is it easy as compare to other ETL tools. Of the tool or the system you are building using the tool? #9. Re- usability --> Do the SAS DI transformations are reusable Yes, you can build reusable transformations/jobs. #10. Scheduling --> is it possible to do the scheduling in SAS DI and how we do that "Yes", it's actually done in Management Console, which uses metadata from DI Studio. Direct integration with crontab, Windows Task and IBM LSF. #11. Parallelism --> Does SAS DI support parallelism and how it works? Yes. First and simplest - multi threading processing. Then parallelism of steps, either by simple forking, or via SAS Grid Manager which uses LSF for cost based allocations among grid servers. #12. Backup & Recovery --> Does backup and recovery possible in SAS DI and how ? Backup is initiated by default of DI metadata. For disaster recovery, these backup just need an OS level bakup routine. Clarify if you meant other type of backup/recovery.

abhityagi · ‎10-31-2018

Thanks for your quick response, It will be really great If you can share the report. Thanks.

mnjtrana · ‎04-30-2018

Scheduling in SAS EG has always been a little mess as you need to be fully aware of the operating system your OS is running on. It could work by creating a vbscript which SAS EG can create for you from dropdown and then you can schedule the same in windows task scheduler using the cscript.exe tool. However the server where this vbscript file should be run on,should be windows. As your SAS EG is working as headless server, i.e its GUI is on windows and the OS where it submits the job for execution is Unix and unix doesn't have the cscript.exe tool to run the vbscript or any other windows tools for that matter. So you won't be able to schedule this program.

Astounding · ‎09-19-2017

The only change that occurs on a routine basis is to convert each DD statement to either a FILENAME (for text) or a LIBNAME (for SAS data sets) statement. You can even practice now. Mainframe SAS permits these statements, although they are slightly more complex on a mainframe.

Reeza · ‎09-17-2017

It's not an issue of cloud versus local. You likely had a mainframe not on site anyways so that doesn't really change. The mainframe is a different OS and there are sometimes issues moving between Unix and windows systems that you'll have to be aware of. These are not SAS specific issues though, they're universal.

abhityagi · ‎09-17-2017

Hi Alan. Thanks for reply. I got the solution of this problem. I took all the labels in flat files and take into DATA STEP. We can use CNTLIN option later on for Proc format.

Online Status	Offline
Date Last Visited	‎10-21-2021 04:03 AM

Re: how to output proc freq results

Re: how to output proc freq results

how to output proc freq results

Data Profiling

Proc sort nodupkey is not working

Re: How to compare one sas dataset with many sas datasets without proc...

How to compare one sas dataset with many sas datasets without proc com...

What is the best way to identify identical duplicate records in large ...

Unix Archive command is not working with X in Unix

How to assign all column names and respectively their values in anothe...

Re: how to output proc freq results

Re: how to output proc freq results

Re: SAS DI VS Talend DI

Re: how to schedule a sas program for daily bases without schedule pro...

Re: Mainframe SAS to AWS cloud environment

Mainframe SAS to AWS cloud environment

Re: Mainframe SAS to AWS cloud environment

Re: how to output proc freq results

Re: Data Profiling

Re: Proc sort nodupkey is not working

Re: How to compare one sas dataset with many sas datasets without proc...

Re: What is the best way to identify identical duplicate records in la...

Re: Unix Archive command is not working with X in Unix

Re: How to assign all column names and respectively their values in an...

Re: How to validate attributes of a SAS dataset

Re: how to check if any particular values are exists or not in sas dat...

Re: SAS DI queries

Re: SAS DI VS Talend DI

Re: how to schedule a sas program for daily bases without schedule pro...

Re: Mainframe SAS jcl convert Korn Shell scripting. Please share an ex...

Re: Mainframe SAS to AWS cloud environment

Re: Proc format to create 1 million formats

SAS Global Forum 2018