About lydiawawa

lydiawawa · ‎11-23-2020

thank you

reprui · ‎03-02-2020

Hi! I'm not sure this will answer your specific question, but it may offer an alternate path if all you are trying to do is read STRING formatted column data in Hive with SAS... Suggestion: Try using a LIBNAME statement to connect to your Hadoop session first (implicit pass-through method). When doing so, use DBMAX_TEXT= LIBNAME Option (set to whatever "x" length you want - Also, you can use DBSASTYPE= Data Set Option to fix columns one at a time). This will substring all STRING columns to "x" bytes upon "READ" by SAS. Here's an example: LIBNAME hdp HADOOP port=10000 schema=default host="sasserver.demo.sas.com" /* replace with your server info and other connection values you are using in your CONNECT TO configuration (explicit pass-through method). */ sql_functions=all dbmax_text=100 /* execute the LIBNAME statement with DBMAX_TEXT= option */ ; proc contents data=hdp.deno3; run; /*similar to SHOW TABLES in Hive so you can see which STRING items got truncated.*/ Proc Freq Data=hdp.deno3; /* This will deliver your counts - Watch out for high cardinality - Other procs for count might be better suited depending on the column values (I.e. PROC MEANS for numeric data */ Table name STATEFP COUNTYFP ; run; Good Luck!

lydiawawa · ‎02-29-2020

I will let you know by Monday. Thank you so much!

Tom · ‎02-25-2020

Normally you would do that in two steps. One to combine the datasets that use ITEM and STATE as the keys. Then a second step to combine the datasets that use ITEM as the key. data one; input item $ state $ county $ by_county; cards; Apple DC bb 3 Apple DC cc 2 Apple MD aa 4 Pear VA cc 6 ; data two; input item $ state $ by_state; cards; Apple DC 5 Apple MD 4 Pear VA 6 ; data three; input item $ by_item; cards; Apple 9 Pear 6 ; data step1; merge one two; by item state; run; data want; merge step1 three; by item; run; Do you really want the value of BY_ITEM to only appear on one of the multiple rows with that value of ITEM? Here is trick to do that. data step1; merge one two; by item state; output; call missing(of _all_); run; data want; merge step1 three; by item; output; call missing(of _all_); run; If (and this is a BIG if) you are positive that THREE has one and only one observation for every value of ITEM in ONE or TWO then you can try using this to combine in one step. But I would NOT use it for anything serious. Just as a check to see if it could work. data want ; merge one two; by item state; if first.item then set three; output; call missing(of _all_); run; To take that crazy idea even further if you are sure that TWO has one and only one observation for every ITEM/STATE combination that is in ONE then your code could become: data want ; set one ; by item state; if first.state then set two; if first.item then set three; output; call missing(of _all_); run;

lydiawawa · ‎01-08-2020

Thank you all for looking into this case. It is caused by the missing parentheses, and I would definitely use @r_behata for checks. Sorry for the kinda "stupid" question..

Kurt_Bremser · ‎11-07-2019

400 Gigabytes? Then the 10 minutes are in fact very fast, as you read 1GB in 1.5 seconds. @lydiawawa wrote: The dataset size is about 407833 million bytes. Page size: 65536

hashman · ‎11-07-2019

@ChrisNZ: Have read through it and detected nothing that would deviate from my knowledge. Thanks!

hashman · ‎11-01-2019

@lydiawawa: There exists a fairly widely spread illusion that one needs to know how to use the hash object only when one has a lot of data to process. Surely under many circumstances hash tables can speed things up quite a bit - for example, by making it unnecessary to sort large files when data need to be combined or aggregated. However, the main strength of the hash object in general is that it is an extremely flexible and convenient tool for dynamic programming, frequently lending itself to accomplishing in one step and/or single pass what otherwise would require several and doing it using simpler and more straightforward logic to boot. The task you've posted in this thread is a good illustration. Imagine that your data are unsorted and look at the hash program doing what you want: data have_unsorted ; input id $ date $ ; cards ; 2 1/3/2006 2 1/1/2005 2 1/1/2005 2 1/1/2005 2 1/2/2005 1 1/1/2001 1 1/2/2002 2 1/2/2005 2 1/3/2006 1 1/2/2002 1 1/2/2003 1 1/2/2003 1 1/1/2001 ; run ; data want_unsorted ; if _n_ = 1 then do ; dcl hash h () ; h.definekey ("id", "date") ; h.definedata ("count") ; h.definedone () ; end ; set have_unsorted ; if h.find() ne 0 then count = 1 ; else count + 1 ; h.replace() ; run ; If you're unfamiliar with the hash object, it may look Greek to you, and yet its logic it exceedingly simple. Namely, for each record: If a key-value (id,date) is not in the table yet, assign count=1 and store it in the table. Output count=1. Otherwise, look in the table and see what count is there for this (id,date). Add 1 to that count value and store the result back in the table for this (id,date) overwriting the previous value of count there. Output the new value of count. It's that simple. The hash table just keeps track of all previous counts for every (id,date) key-value encountered thus far. And because it automatically grows by 1 item every time a new (id,date) is seen, there's no need to pre-process the input to size it up at compile time - as it would be necessary, for example, if an array were used as the count-tracking table instead. Furthermore, when you search the table for the current record's (id,date) value to find what the previous value of count has been, this act of lookup takes the same time regardless of how many items have been stored in the table (say, ten or a million), as this is one of the hash object's properties. If you're interested, a brief compendium on things of this nature can be found here (penned by @DonH and yours truly): http://support.sas.com/resources/papers/proceedings17/0821-2017.pdf Kind regards Paul Dorfman

lydiawawa · ‎10-31-2019

wow you guys are like saints. Thank you so much for helping me understand the ambiguities.

lydiawawa · ‎10-03-2019

Hi Kurt, I will be sure to do that next time. Thank you so much for the help.

ScottBass · ‎07-18-2019

@Reeza wrote: I definitely don't need any more answers, feel free to mark Scott's answer as correct 🙂 Yeah @Rezza 's got 28K+ posts, I've got 653 🙂 Throw me a bone! @lydiawawa , glad your problem is solved. I revisited your original post: I have a time variable in the format of datetime32.4(ex: 21MAR2019:10:19:15.2970) Beware of fractional seconds falling through the "gaps" in the format. You may need to truncate the fractional seconds from your data then apply the format, or else expand the format ranges to include the granularity of your source data. Edit: Actually, the only gap is for 23:59:59, since all the other end points use the -< operator. If you're worried about fractional seconds, perhaps change "23:59:59"t to "23:59:59.999999"t (and test!)

PeterClemmensen · ‎06-06-2019

A data step approach data want(drop=flag); do until (last.session); set have; by session; if code=7.1 then flag=1; end; do until (last.session); set have; by session; if flag ne 1 then output; end; run;

Ksharp · ‎05-25-2019

if index(vvalue(timestamp), 'AM') then td = "AM";

lydiawawa · ‎05-21-2019

for some reason this worked but it also removed all rows with missing session and grp values.

Patrick · ‎05-11-2019

@lydiawawa The following code should return what you want IF your source data complies with the following requirements: - A new group starts always with value " DASHBOARD_LOAD_TIME" - Each group has a row with value " DASHBOARD_<digits> _START_TIME " data have; infile cards truncover; input Name :$30.; cards; DASHBOARD_LOAD_TIME DASHBOARD_LANG_ENTRY_IND DASHBOARD_1_START_TIME DASHBOARD_LANG_EXT_IND DASHBOARD_CLOSE_BROWSER_TIME DASHBOARD_LOAD_TIME DASHBOARD_LANG_ENTRY_IND DASHBOARD_2_START_TIME DASHBOARD_LANG_EXT_IND DASHBOARD_CLOSE_BROWSER_TIME DASHBOARD_LOAD_TIME DASHBOARD_LANG_ENTRY_IND DASHBOARD_LANG_EXT_IND DASHBOARD_CLOSE_BROWSER_TIME DASHBOARD_3_START_TIME ; run; data want(drop=_:); length id start_time_num 8; retain start_time_num; id=_n_; set have; if name='DASHBOARD_LOAD_TIME' then do; call missing(start_time_num); do _point=_n_+1 to _nobs; set have(keep=name rename=(name=_name)) point=_point nobs=_nobs; if prxmatch('/DASHBOARD_\d+_START_TIME/oi',_name)>0 then do; start_time_num=input(scan(_name,2,'_'),?? best32.); leave; end; end; end; run;

Online Status	Offline
Date Last Visited	‎03-17-2023 03:41 AM

Re: How to create multiple variables with date suffix in a do loop

Re: How to create multiple variables with date suffix in a do loop

How to create multiple variables with date suffix in a do loop

Intck does not give the desired output

Scan does not work inside of CATX

Re: Unable to send email via echo to show SAS session status

Re: Unable to send email via echo to show SAS session status

Re: Unable to send email via echo to show SAS session status

Unable to send email via echo to show SAS session status

Re: Issue with importing text parameter file using input statement

Re: How to create multiple variables with date suffix in a do loop

Re: How to create multiple variables with date suffix in a do loop

Re: How to create multiple variables with date suffix in a do loop

Re: Intck does not give the desired output

Re: Intck does not give the desired output

Re: Output frequency counts to compare all variables in two datasets

Re: Retain non-missing by specific string to fill in missings

SAS Raking macro closes all programs

Re: Records by condition exist in proc freq, but subset by condition r...

Re: SAS/Access columns could have a length in SAS of 32767 extract 0 r...

Re: How to transpose and calculate percent of total by level

Re: Multiple joins by first row

Re: Unable to convert Hive table to SAS table

Re: How to subset a very large dataset by date

Re: Removing duplicates by removing reappeared chunk of records (could...

Re: Identify duplicates by datastep and proc sql

Re: Convert proc sort nodupkey to proc sql

Re: Comparing values in groups

Re: Format time in grouped blocks

Re: Remove group by conditions

Re: How to extract "AM" or "PM" from a time variable

Re: One to many merge error

Re: How to filling in missing by the none-missing value in the middle?