Hi,
I was wondering if someone can please help me?
I have three datasets, all of them have one common column called file name.
First dataset contains 2000 rows all unique rows (One row per filename).
Second one has multiple rows per file name, some filenames have 7 rows and some more than 7.
Similar for third dataset.
I am looking to integrate second and third datasets in a way that both of these datasets have same one row per filename.
Can anyone suggest how should I proceed for that?
So, that I can merge both of these datasets into First dataset as features.
Sample table
Filename | Category | Type | Col1 | Col2 | Col3 | Col4 | Co5 | Flg | Read | Write | Execute | Align |
1 | 1 | LOAD | 0x000000 | 00010000 | 00010000 | 1515c | 1515c | R E | 1 | 0 | 1 | 0x10000 |
1 | 1 | GNU_STACK | 0x000000 | 00000000 | 00000000 | 00000 | 00000 | RW | 1 | 1 | 0 | 0x4 |
1 | 1 | LOAD | 0x01515c | 0003515c | 0003515c | 00518 | 06874 | RW | 1 | 1 | 0 | 0x10000 |
2 | 1 | LOAD | 0x000000 | 08048000 | 08048000 | 0d0a0 | 0d0a0 | R E | 1 | 0 | 1 | 0x1000 |
2 | 1 | GNU_STACK | 0x000000 | 00000000 | 00000000 | 00000 | 00000 | RW | 1 | 1 | 0 | 0x4 |
2 | 1 | LOAD | 0x00d0a0 | 080560a0 | 080560a0 | 0031c | 06800 | RW | 1 | 1 | 0 | 0x1000 |
3 | 2 | LOAD | 0x000000 | 00010000 | 00010000 | 1123c | 1123c | R E | 1 | 0 | 1 | 0x10000 |
3 | 2 | GNU_STACK | 0x000000 | 00000000 | 00000000 | 00000 | 00000 | RW | 1 | 1 | 0 | 0x4 |
3 | 2 | LOAD | 0x01123c | 0003123c | 0003123c | 004f4 | 06644 | RW | 1 | 1 | 0 | 0x10000 |
Regards
You should provide some data, enough to demonstrate what you need, for each of the tables and what the final result for the example data should look like. From your description possibly 2 rows for the first table mentioned and cases for the other data sets such that one of the filenames from the first data is only used once and the other 3 times. You don't need to provide ALL of the variables but enough to demonstrate what you want done with other common variables if they exist. For instance if the variable XYZ exists in sets 2 and 3 which version do you want in the final result when they differ?
It really helps to provide data in the form of a datastep. Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the {i} icon or attached as text to show exactly what you have and that we can test code against.
It also helps to show code or describe what you have attempted so far.
Hi Ballard,
Thanks for your reply, I am attaching sample XLS sheet for your reference of the datasets. Appologies for not creating dataset they way you asked, I am new to SAS so need study a little bit more.
Sheet 1 contains main data in which I need to integrate sheet 2 and sheet 3. Its like 1 to many relationship. As sheet2 and 3 contain multiple rows of the same files present in sheet1.
Sheet 2 and 3 contains fields highlighted as red, about them I am not sure how to integrate. One way I thought was to integrate like a count of unique addresses per file for example for file 1 , in sd_off the count will be 2 , sd_virtad count=3 etc.
In sheet2, sd_type becomes the feature and represented as a yes/no value in integrated sheet.
In sheet3, td_sname and td_stype are two important columns and have unique combinations of values as for every td_sname there is a td_stype, so, every unique combination becomes a column and represented as a yes/no value for each file name its present as 1 and 0 for rest.
and red fields are the counters and flags should be the same.
Hope it clarifies more about my problem. Main thing is sheet 2 and 3 should be integrated in such a way that every file has one row representation only. Dont bother how many fields.
Regards
If you want to reduce multiple rows to one there three basic methods.
1) You could perform summary functions over the values. So a numeric column could be reduced to the sum of the values in the column. Or perhaps you want to take the min or max value, that could work for both numeric and character variables.
2) You could TRANSPOSE the rows into columns. So if there are three observations be group your could convert that into three columns for one observation.
3) You could pick the best observation and only use the data from that observation. So perhaps the first one or the last one or the one with largest value for some column.
Which of these methods do you want to do to transform your second and third datasets?
You will need to explain the data and the collapsing logic more clearly. Please include concrete (simplified) examples. You could for show an example of 3 rows of data for the input and what you expect the single row of data in the output to look like. Include the example in the text of your message and NOT as an attachment. And if you must use an attachment then attach a text file and not a proprietory format like XLSX or DOCX.
I have no idea what you mean by hex values. It sounds like you want to make a list of distinct values? Do you want those values concatenated into a single longer character variable? Does the order of the original observations matter, or could you sort the data to eliminate duplicate values?
If by flags you mean boolean 0/1 values then usually you will want to either take the MAX() over the observations which is effectively an OR across the observations or SUM() which is effectively a count of the number of true cases.
The way you can create a clear example of what you want is to copy and paste your data as TEXT ONLY. No tabs. No HTML table. Make sure to use the Insert code or Insert SAS code icons in the rich text editor on this site so that it will preserve the spaces and be displayed using non-proportional font. If you want to edit the text in the code boxes remember to put the cursor inside the box and click the icon again so that you can edit in the pop-up window. If you try to edit without doing that it messes up the formatting of the line breaks.
Do not try to post an example with more than one of two variables and one or two groups (filenames?). Once you know how to do the transformation for variable, say SD_TYPE, then you can do it for other variables that you want to process in the same way yourself.
The basic structure of your program is going to be.
data want ;
merge sheet1 new_sheet2 new_sheet3 ;
by filename;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.