About RyanMcGovern

RyanMcGovern · ‎06-23-2017

The datasets are not identical in metadata and have a few overlapping named variables. I worry about stacking the datasets because of this. I wouldn’t want to accidentally truncate values in instances where the lengths and things disagreed but the variable names were the same. Because of this I like your approach to rename the variables to something new then rename them back post merge, this could work but I would also have to drop and variables that were in one dataset but not in the appended dataset. This is shown in the second example of the code that I had originally attached. This would also be easier if I used your renaming method and just made each variable have a prefix of the variable name. That would simplify my keep= statements. I might attempt using multiple formats to see how that works out, I’m not a fan of needing to concatenate the multiple primary keys to make one for the index and then subsequently doing the same on the base dataset. That seems like a lot of reads to me. At this point I think I am going to have to just sort/sort merge my lookup tables and split up the larger files prior to merging them. That or revisit my settings on my UNIX server. Right now my config file has Memsize 8g/Sortsize 1g/BUFNO =20/BUFSIZE=512k I will need to see if I can up some of these to get the tables to load into hash faster. It is a virtual machine so potentially I could have the server admin allocate more memory for this task, or move the job to a higher powered server. Thanks!

RyanMcGovern · ‎06-22-2017

Thanks for the response, in real life the lookup tables are independent, they each share a common key with the fact table but do not share one with each other. Because of this I cannot combine both crosswalks and then read this all into one hash object. Thanks RW9! here is what it looks like again i have dozens of different fact tables that i need to hit against the same lookup tables.

RyanMcGovern · ‎06-22-2017

Hi everyone! I am looking to improve some processing and am looking to see if there is a way to either: Keep the hash tables in memory after the data step is closed (so that the hash tables do not have to be reloaded into memory for the next iteration) Match multiple tables back to the same lookup tables in one data step while avoiding butchering the variables in the output datasets. Here is my situation. I am currently matching dozens of large files (100m+ rows >800 columns) against the same 2 lookup tables. Lookup table 1 has 20m+rows 5 columns and lookup table 2 has 2m+rows 4 columns. Option 1 I think is promising… if there is a way to restrict sas from purging the hash objects it should save time loading them when I go to lookup values for the next dataset. I made an example of both options, if you run it you will see what happens to the variables for option 2. Option one I don’t know how to restrict SAS from purging the hash objects. I appreciate your comments! -Ryan

RyanMcGovern · ‎02-12-2016

I am sure there is a cleaner way to do this but I would import all of the fileds as characters, then remove the ones you don't want then convert the variables to be the correct formats. it might take a few steps but this way you have more control over what you are getting.

RyanMcGovern · ‎02-12-2016

nice, i knew something was off with the data import. glad the results are as expected now.

RyanMcGovern · ‎02-12-2016

were you expecting them to be continuous? can i have an example of what your data looks like?

RyanMcGovern · ‎02-12-2016

what are the formats for these two variables? I would guess that SAS is importing them as characters or something unexpected.

RyanMcGovern · ‎02-12-2016

Hi, I am sure there are plenty of ways to do this but I would approach it like /*DETERMINE WHAT NUMERIC VARIABLES ARE NOT FORMATTED*/ PROC SQL NOPRINT; SELECT name INTO :VARS_TO_FORMAT SEPARATED BY ' ' FROM SASHELP.VCOLUMN WHERE LIBNAME="SASHELP" AND MEMNAME="CARS" AND type="num" AND format=''; QUIT; /*THIS IS THE LIST OF VARIABLES THAT WE WILL APPLY FORMATS TO*/ %PUT &VARS_TO_FORMAT.; /*APPLY THE FORMATS*/ DATA cars; SET SASHELP.cars; FORMAT &VARS_TO_FORMAT. COMMA20.; RUN; I hope you can see that you can modify this to be a macro and cycle thru all of the datasets in your projects by referencing the sas dictionary tables and piping the appropriate libnames and dataset names into macros the same way I did with the variable names.

RyanMcGovern · ‎02-09-2016

You could try to sort and then transpose the variables before concatenating them like: data Products_1; input Product_Number $5. Manufacturer $3.; cards; 1111 ABC 1111 DEF 1211 GHI 1324 ABC ; run; PROC SORT DATA=Products_1 OUT=Products_2; BY Product_Number; RUN; PROC TRANSPOSE DATA=Products_2 OUT=Products_3(DROP=_NAME_); BY Product_Number; VAR Manufacturer; RUN; DATA Products_4(KEEP=Product_Number Manufacturers); SET Products_3; Manufacturers=CATX(", ", COL1, COL2); RUN;

Online Status	Offline
Date Last Visited	‎08-19-2022 12:04 PM

Re: hash joins with multiple fact tables

Re: hash joins with multiple fact tables

hash joins with multiple fact tables

Re: How to remove inconsistent data rows?

Re: ERROR: Variable biomf in list does not match type prescribed for t...

Re: ERROR: Variable biomf in list does not match type prescribed for t...

Re: ERROR: Variable biomf in list does not match type prescribed for t...

Re: Permanent Formats for Numeric Values

Re: Need help with what in Excel is Concatenate If

Re: How to modify variable length which has already been set?

hash joins with multiple fact tables

Re: Need help with what in Excel is Concatenate If

Re: hash joins with multiple fact tables

Re: hash joins with multiple fact tables

hash joins with multiple fact tables

Re: How to remove inconsistent data rows?

Re: ERROR: Variable biomf in list does not match type prescribed for t...

Re: ERROR: Variable biomf in list does not match type prescribed for t...

Re: ERROR: Variable biomf in list does not match type prescribed for t...

Re: Permanent Formats for Numeric Values

Re: Need help with what in Excel is Concatenate If