Hello!
Thank you all in advance for your help. I am working from SAS 9.4, and am having trouble transposing data. My dataset is currently in a "wide" format and I would like to make it a "long format". Below is an example of my imported data that I am trying to transpose.
I have successfully transposed the data from wide to long using the code below:
*Transpose Data from wide to long, Needed to sort first;
Proc sort data=Traffic_Ranks_2;
by Counties;
Run;
Proc Transpose data=Traffic_Ranks_2 out=long_Ranks1 prefix=Pop;
by Counties;
var Pop_2014-Pop_2018;
run;
Proc Transpose data=Traffic_Ranks_2 out=long_Ranks2 prefix=Traffic_Count;
by Counties;
var Traffic_Count_2014-Traffic_Count_2018;
run;
Proc Transpose data=Traffic_Ranks_2 out=long_Ranks3 prefix=Traffic_FR;
by Counties;
var Traffic_FR_2014-Traffic_FR_2018;
run;
proc print data=long_Ranks2 (obs=2);
run;
*Merge Transposed Datasets;
Data Traffic_Ranks_Long;
merge long_Ranks1 (rename=(Pop1=Pop) drop=_name_) long_Ranks2(rename=(Traffic_Count1=Traffic_Count) drop=_name_)
long_Ranks3(rename=(Traffic_FR1=Traffic_FR) drop=_name_);
by Counties;
Year=input(substr(_name_, 15), 4.);
drop _name_;
drop Pop1 Traffic_Count1 Traffic_FR1;
run;
Proc Print data=Traffic_Ranks_Long (obs=1);
Run;
The printed version came out as such:
I am needing help extracting the year correctly, after tranposing the data. I have researched substr syntax and found teh following: the <variable=>SUBSTR(string, position <, length>), however no numbers that I use are correctly extracting the year. I would like this output dataset to reflect as the above, but have the full year (4 digits).
Thank you again for your help and I look forward to hearing from you!
Best,
Briana Thrift
It's not 'subract\ but, rather, substring.
You're currently using:
Data Traffic_Ranks_Long;
merge long_Ranks1 (rename=(Pop1=Pop)) long_Ranks2(rename=(Traffic_Count1=Traffic_Count))
long_Ranks3(rename=(Traffic_FR1=Traffic_FR));
by Counties;
Year=input(substr(_name_, 11), 4.);
drop Pop1 Traffic_Count1 Traffic_FR1;
run;
No need to drop Pop1 Count1 and Traffic_FR1, as they no longer exist since you renamed them in the merge statement.
My suggestion would be to (1)add a drop=_name_ option to two of your merge files (i.e.:
Data Traffic_Ranks_Long;
merge long_Ranks1 (drop=_name_ rename=(Pop1=Pop)) long_Ranks2(drop=_name_ rename=(Traffic_Count1=Traffic_Count))
long_Ranks3(rename=(Traffic_FR1=Traffic_FR));
by Counties;
Year=input(substr(_name_, 12), 4.);
drop _name_;
run;
That way you would always be taking the year from 'Traffic_FR_XXXX', beginning with the 12th character.
Art, CEO, AnalystFinder.com
Show us a few typical values of variable _NAME_ from the last data set.
Thank you for your response.
This is the output of _name_ for the first transposed dataset:
The output of _name_ for the second transposed dataset:
And finally, the output of _name_ for the last transposed dataset:
In the merged dataset of all three separate transposed datasets, I dropped the _name_ variable and renamed for each variable to have their own column (Pop, Traffic_Count, Traffic_FR).
Thank you!
If you are dropping _NAME_ in the MERGE statement, you have nothing for SUBSTR to work with. So you always get missing as the result.
Here is a method that accomplishes the whole task without requiring any sort or multiple steps:
filename ut url 'http://tiny.cc/untranspose_macro';
%include ut ;
%untranspose(data=Traffic_Ranks_2, out=want, by=County,
id=year, var=Pop Traffic_Count Traffic_FR, delimiter=_)
Conversely, yes, given your current method you can easily extract year, using the substr function, as long as you keep the _name_ variable.
Art, CEO, AnalystFinder.com
Thank you for all your response. My issue is knowing what numbers to use in the "subtract" syntax to correctly extract the year variables. The current code I have, after dropping _name_, is the following:
*Merge Transposed Datasets;
Data Traffic_Ranks_Long;
merge long_Ranks1 (rename=(Pop1=Pop)) long_Ranks2(rename=(Traffic_Count1=Traffic_Count))
long_Ranks3(rename=(Traffic_FR1=Traffic_FR));
by Counties;
Year=input(substr(_name_, 11), 4.);
drop Pop1 Traffic_Count1 Traffic_FR1;
run;
Proc Print data=Traffic_Ranks_Long (obs=5);
Run;
My output is still missing year, as the following:
I think my main issue in the "subtract" syntax is the number "11 and 4". I know 4 is correct, because I was the year to be formatted to 4 digits (i.e.: year 2020, instead of year 20). However, I am having a hard time figuring out what the first number should be. I thought the first number represented how many characters/numbers to subtract from an original variable name to yield the variable name of interest. For example, subtract 5 numbers from "Pop_2018" (subtracting letters P,O,P,_) to start on "2" of "2018" and yield 2018 as my final variable name. However, no number seems to yield the year variable correctly.
I hope this explanation makes sense. Please let me know if you have any questions, and I look forward to your answer. Prefer non-macro answers please, as I do not understand SAS macro coding yet.
Thank you!
It's not 'subract\ but, rather, substring.
You're currently using:
Data Traffic_Ranks_Long;
merge long_Ranks1 (rename=(Pop1=Pop)) long_Ranks2(rename=(Traffic_Count1=Traffic_Count))
long_Ranks3(rename=(Traffic_FR1=Traffic_FR));
by Counties;
Year=input(substr(_name_, 11), 4.);
drop Pop1 Traffic_Count1 Traffic_FR1;
run;
No need to drop Pop1 Count1 and Traffic_FR1, as they no longer exist since you renamed them in the merge statement.
My suggestion would be to (1)add a drop=_name_ option to two of your merge files (i.e.:
Data Traffic_Ranks_Long;
merge long_Ranks1 (drop=_name_ rename=(Pop1=Pop)) long_Ranks2(drop=_name_ rename=(Traffic_Count1=Traffic_Count))
long_Ranks3(rename=(Traffic_FR1=Traffic_FR));
by Counties;
Year=input(substr(_name_, 12), 4.);
drop _name_;
run;
That way you would always be taking the year from 'Traffic_FR_XXXX', beginning with the 12th character.
Art, CEO, AnalystFinder.com
Thank you so much! That worked perfectly. My last question is, how do I know which variable to refer to for the substring syntax?
For instance, do I use the following syntax to take year from 'Traffic_FR_XXXX', beginning with the 12th character?:
Year=input(substr(_name_, 12), 4.);
Or if I should use the following syntax to take year from 'Traffic_Count_XXXX', beginning with the 15th character?:
Year=input(substr(_name_, 15), 4.);
Or if I should use the following syntax take year from 'Pop_XXXX', beginning with the 5th character? :
Year=input(substr(_name_, 5), 4.);
Thank you again for your help!
Best,
Briana Thrift
You are merging three files that each have an _name_ variable. However, as suggested, dropping the _name_ variable from two of the three files results in your only having one name variable (namely the one that wasn't dropped). Given the code you posted, the third file will always have an _name_ equal to Traffic_FR_XXXX, where XXXX represents the year).
Art, CEO, AnalystFinder.com
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.