About sbennet

sbennet · ‎02-04-2013

Hi Astounding, I was originally going to use a hash table to see if I could work out this problem but it came down to the issue that the organization of the data limits that functionality of using a hash table and setting up the unique has tag to represent only one set of matching observations. Good luck with your project! Cheers, Scott

sbennet · ‎02-04-2013

Hi PGstats. Yeah that is what I am working on today to see if I can figure out the error in the data and figure out why the duplicates are present. Thank you again for the help! Cheers, Scott

sbennet · ‎02-01-2013

Hello Astounding, The matching can appear from anywhere within the original data set and in most cases are not consecutive. If the data is perfect then one observation should only have match or have no match. The data is a table representing the typology of a road network so for a two way road there will be a matching set of observations. One for each direction of travel. For one way roads there is only one observation representing the direction of travel. The big issue is that the duplicate observations do not have a common identifier besides the to and from column. The code provided by PG stats seems to work well and give me the results I was hoping for. Cheers, Scott

sbennet · ‎02-01-2013

Thank you PG for the suggestion. It works and runs very quick. I need to tweak it a bit as I seem to get some additional output pairs that shouldn't exist (about 1000 extra) Thank you Cheers, Scott

sbennet · ‎02-01-2013

Thank you for the suggestion. I tried something similar before but was unable to make it work because of the limits of storing the vast number of data in memory. It would work if I had a smaller dataset Cheers, Scott

sbennet · ‎02-01-2013

Hello, I am trying to figure out a way to determine what rows of data are duplicates of each other. It is easiest to explain my problem through an example. I have a dataset as follows: Edge To From Flow Exposure 1 1 2 10 10 2 2 1 5 10 3 3 4 5 5 4 4 3 5 5 5 5 6 10 10 6 6 5 5 5 The edge value is a unique identifier for each data row. I am trying to match the data based on the "to" and "from" column where in the case of this sample data set the matching pairs (by "edge" id) are as follows 1,2 3,4 5,6 The "to" and "from" values between edge 1 and 2 are a reciprocal of each other. I want to figure out what edges are matches for each other and also sum up the flow and exposure value for the matching edges. What I would like to do is end up with a table as follows Edge To From Flow Exposure match flow_sum exposure_sum 1 1 2 10 10 1 15 20 2 2 1 5 10 1 15 20 3 3 4 5 5 2 10 10 4 4 3 5 5 2 10 10 5 5 6 10 10 3 15 15 6 6 5 5 5 3 15 15 In this case the "match" variable is just identifying what edges are pairs. It doesn't have to be a number, it can be a variable and can start at any value. Is this possible? The data set I am working with has about 300,000 rows of data with the majority of them having pairs. Thank you for the help. Cheers, Scott

sbennet · ‎01-22-2013

That is a very efficient way of making this work, Because my data is varying in the number of columns, depending on what part of the data I am working with, I need to use a macro either in the code you suggest or use as macro as suggested by Anca. Thank you for the suggestion. Cheers, Scott

sbennet · ‎01-22-2013

Thank you everyone for all your help! Both of the methods suggested by Arthur Tabachneck and Anca Tilea work. Anca's works better for my application. Thank you again for all the help! Cheers, Scott

sbennet · ‎01-22-2013

Hello, I am hoping to combine multiple columns into one column (by stacking them sequentially) within a dataset. I have to do this with a substantially large number of columns. Here is an example. I have this: col1 col2 col3 v1 v8 v15 v2 v9 v16 v3 v10 v17 v4 v11 v18 I want to end up with this: col4 v1 v2 v3 v4 v8 v9 v10 v11 v15 v16 v17 v18. Thank you for the help Cheers, Scott

sbennet · ‎08-10-2012

Thank you for the help!

sbennet · ‎08-10-2012

Thank you for the help! That would totally work. I have to run multiple macros with similar variables so to make one variable statement that I can call for each of the different macro loops would work best for my current project.

sbennet · ‎08-10-2012

Is it really that simple? I didn't try that because in the %let help online help, it said you can only assign one value to a variable assigned by %let. Thank you Cheers, Scott

sbennet · ‎08-10-2012

Hello, I am trying to create a macro variable that will contain multiple variables in it. I thought a macro array may be the answer but am not finding any documentation that seems to use a macro array in this way (unless I am misinterpreting the white papers I have been reading). Here is a more detailed description of what I am trying to do. 1) I load a dataset in sas with multiple variables in it. 2) I am running multiple procselect statements to generate databases of randomly selected cases that meet a given set of criteria. This set of criteria is defined by the variables in the dataset. 3) I am then running multiple logistic regression models using a different set of variables from the same dataset, Because I am running multiple models using different variables in the proc select and the logistic regression I am having to change the by and var statements for the different procedures multiples times. I am hoping I can set up a macro global variable (like the macro Let command) where I define all the variables (in one global variable) for a specific procedure at the beginning of the code and then just call that macro in all the proc select and logistic regression procedures. Thank you for the help. Cheers, Scott

Online Status	Offline
Date Last Visited	‎09-01-2015 07:11 AM

Re: Detect duplicate rows in a dataset

Re: Detect duplicate rows in a dataset

Re: Detect duplicate rows in a dataset

Re: Detect duplicate rows in a dataset

Re: Detect duplicate rows in a dataset

Detect duplicate rows in a dataset

Re: Combine columns within a dataset

Re: Combine columns within a dataset

Combine columns within a dataset

Re: Create a macro variable with multiple variables in it

Re: Detect duplicate rows in a dataset

Re: Combine columns within a dataset

Re: Detect duplicate rows in a dataset

Re: Detect duplicate rows in a dataset

Re: Detect duplicate rows in a dataset

Re: Detect duplicate rows in a dataset

Re: Detect duplicate rows in a dataset

Detect duplicate rows in a dataset

Re: Combine columns within a dataset

Re: Combine columns within a dataset

Combine columns within a dataset

Re: Create a macro variable with multiple variables in it

Re: Create a macro variable with multiple variables in it

Re: Create a macro variable with multiple variables in it

Create a macro variable with multiple variables in it