Thank you so much ! So the "repeated by values" is not because the data is being stacked (long) and I would have more than one observation for each OBS? It is coming from the original? If so, I would have to find out which one it is, because that would indicate a mistake in my raw data. We are not supposed to have repeated obs for each OBS in the original file. I see only one observation with a value of 1 in column 2 which would mean I only have one repeated Id maybe. Could I use COL2 to find out this quickly? If so, how? Thanks again.
Just take the OBS value from the observation where COL2 is not missing, and use that to search in your raw data. Checking for duplicates is one part of my Maxim 3 (Know Your Data).
yes, of course. Sometimes I am trying to do things differently and forget the basics. I did find the one id that was indeed incorrectly entered. Col2 is then perfect for me to check for these issues. I really appreciate your help. I have another question. How do I transpose string and date variables? I tried to use the code provided but it did not work. I will give an example because it will add complexity.
I am using the transpose for all the numeric variables. Some of them, not all, have a corresponding string variable. So for example in the example I gave, if med 1 is selected a string one will appear. If med 2 is selected a string 2 will appear and so on. So I would need to create a common var String (instead of string 1, string 2 and so on) that would assign each string to each key selected in the Med var. I would do the same with dates variables that are sometimes asked for the numeric variables the same way I would do the strings. Thank you again.
The double transpose is only good for variables with similar attributes; for mixed types/attributes (e.g.character, "normal" numeric and numeric/dates), it's better to transpose the variable groups separately and merge back.
Understood. I am planning on doing that. But I am not sure if I need to change the transpose, when I use it for string and dates it does not work. I am wondering if I am doing something wrong.I am using the code provided here and adding the string variable names where you see the numeric variable names. there is probably a change needed for variables that are not numeric?
Sorry about the multiple messages. I read that if I have a mix of numeric and characters I need to transform all of them into characters first. Is that the only way? And would date variables also be transformed into character without giving me issues?
The information I found was to use var _all_;
But if I am transposing one by one I was hoping to be able to transpose the characters as characters and dates as dates. Is that possible in this case?
You could use _ALL_ or _CHARACTER_ or _NUMERIC_ in the VAR statement. To distinguish just the DATE variables from the other numeric variables you will need to know the names of the variables or use some code to generate the list by checking the format type attached to the variables.
Note if you don't include the VAR statement then PROC TRANSPOSE does the numeric variables. So these two steps are the same:
proc transpose data=have out=want ;
run;
proc transpose data=have out=want ;
var _numeric_;
run;
That is good to know. What I don't understand is why my code (please see below) for transposing the character variables are not working as they do for numeric. I must be doing something wrong. I just can't see it. Thank you.
proc transpose
data=have
out=long (
where=(col1 ne ' ') * since this is a string var I changed this from 0 to ' ' I also tried this with ne 0
)
;
by id;
var xxxxxxS:;
run;
data want;
set long;
xxxxxxS = scan(_name_,-1,"_");
drop _name_ col1;
run;
What is different about the results? Remove the WHERE= dataset option and look at the resulting dataset to try to figure out what is happening. You could add the OBS= dataset option on the input dataset to PROC TRANSPOSE to just process the first few observations to make it easier to check.
proc transpose data=have(obs=2) out=long ;
by id;
var xxxxxxS:;
run;
proc print data=long;
run;
Also check to make sure all of the selected variables are character. If any are numeric then your test for ' ' will not match the missing numeric values since those will most likely end up having a period with multiple leading spaces. So you might want this filter instead.
where left(col1) not in (' ','.');
Or you could temporarily change the MISSING option to display missing as space instead of period.
I checked the variable as you suggested, and it looks right. Please see below. I have these 3 xxxS variables that I am trying to transpose into one xxxS. As you can see only one option of one obs has data entered. This will be similar throughout. Some will have nothing entered and others will. There is no way for me to have any numeric values in these (due to the way the data is collected). I tried your suggestion and it still shows the variable I am creating xxxS as numbers not characters. Please see outputs below. I can't figure out why. Thanks.
obs | id | name | COL1 this becomes my var xxxS |
1 | A | xxxS_1 | |
2 | A | xxxS_2 | |
3 | A | xxxS_3 | Characters entered |
4 | B | xxxS_1 | |
5 | B | xxxS_2 | |
6 | B | xxxS_3 |
obs | id | name | var xxxS |
1 | A | xxxS_1 | 1 |
2 | A | xxxS_2 | 2 |
3 | A | xxxS_3 | 3 |
4 | B | xxxS_1 | 1 |
5 | B | xxxS_2 | 2 |
6 | B | xxxS_3 | 3 |
You are setting that variable to the suffix on the NAME of the original variable.
xxxxxxS = scan(_name_,-1,"_");
Is that not what you wanted to do?
If you want to just rename COL1 to xxxxxxS then just use a RENAME statement (or RENAME= dataset option).
Perhaps you want to only take the suffix when the value is not missing?
data have ;
input obs id $ _name_ $ COL1 $30. ;
cards;
1 A xxxS_1
2 A xxxS_2
3 A xxxS_3 Characters entered
4 B xxxS_1
5 B xxxS_2
6 B xxxS_3
;
data want;
set have ;
suffix = scan(_name_,-1,'_');
if not missing(col1) then suffix2=scan(_name_,-1,'_');
run;
proc print;
run;
Obs obs id _name_ COL1 suffix suffix2 1 1 A xxxS_1 1 2 2 A xxxS_2 2 3 3 A xxxS_3 Characters entered 3 3 4 4 B xxxS_1 1 5 5 B xxxS_2 2 6 6 B xxxS_3 3
Yes. I think I do not know what the Scan does when I transpose the numeric.
why do I need it for numeric and not for character then is my question.
Do I just replace the scan with rename ?
when I use (
where=(col1 ne ' ')
) ;
data file created is
obs | id | vars S |
1 | A | 2 |
2 | B | 1 |
Better but not what I need. The part of the code that looks like it is not working is this one:
data INSD;
set long;
mm03a56051S = scan(_name_,-1,"_");
drop _name_ col1;
run;
Ok did that. All I need is that my col1 become my new data as it was doing for the numeric vars.
This is what I get now:
Obs | id | _NAME_ | COL1 | COL2 | suffix | suffix2 |
1 | A | xxxS_2 | entered character data | 2 | 2 | |
2 | B | xxxS_1 | entered character data | 1 | 1 | |
3 | C | xxxS_2 | entered character data | 2 | 2 | |
4 | D | xxxS_1 | entered character data | 1 | 1 | |
5 | E | xxxS_2 | entered character data | 2 | 2 |
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.