I have an extremely large csv dataset (in the millions) that is formatted in an odd way. There is only one column for variable values, and another column with a code for what each value is measuring. So, for example, if I had a small dataset with two variables A and B that was formatted the normal way:
Year A B
1994 10 30
1994 20 40
In the scheme I was given, it would be formatted in this way:
Year Value Code CodeDescription
1994 10 2844 "A"
1994 20 2844 "A"
1994 30 2855 "B"
1994 40 2855 "B"
Using some simple "if" statements, I can read in the data in this way to separate it all out:
if(Code = 2844) then A = Value
if(Code = 2855) then B = Value
When I read in the data and sort it by year, it will come out in this way:
Year A B
1994 10 .
1994 20 .
1994 . 30
1994 . 40
Even if I sort it by date, it will come out this way. I would like it to be formatted as such:
Year A B
1994 10 30
1994 20 40
I've tried a bunch of different options but cannot seem to get it to sort properly. Some of the variables have more measurements than others, and my goal is to have it aligned such that any missing variables will be at the end of the date sets, and the data can be aligned as best as it can be. I have found that, due to the way the data is originally formatted, SAS will treat each read as an individual observation, and set any other variables to missing.
Does anyone know of a way I can do this? I have tried everything from exporting it and re-reading it in to transposing twice to see if that would make any sort of difference.
It is not clear to me that it is valid to do what you want. Why did you put A=10 and B=30 in the same row instead of A=20 and B=30?
If we assume that the values of the same variable for the same year can be matched in the order that they appear in the source file then we just need to introduce a new variable to let PROC TRANSPOSE do the work for us.
data have ;
length name $34 ;
input year value code name $ ;
name=dequote(name);
cards;
1994 10 2844 "A"
1994 20 2844 "A"
1994 30 2855 "B"
1994 40 2855 "B"
run;
data vertical ;
set have ;
by year code ;
if first.code then row=1;
else row+1;
run;
proc sort ;
by year row code ;
run;
proc transpose data=vertical out=want ;
by year row ;
id name ;
var value ;
run;
Obs year row _NAME_ A B
1 1994 1 value 10 30
2 1994 2 value 20 40
It is not clear to me that it is valid to do what you want. Why did you put A=10 and B=30 in the same row instead of A=20 and B=30?
If we assume that the values of the same variable for the same year can be matched in the order that they appear in the source file then we just need to introduce a new variable to let PROC TRANSPOSE do the work for us.
data have ;
length name $34 ;
input year value code name $ ;
name=dequote(name);
cards;
1994 10 2844 "A"
1994 20 2844 "A"
1994 30 2855 "B"
1994 40 2855 "B"
run;
data vertical ;
set have ;
by year code ;
if first.code then row=1;
else row+1;
run;
proc sort ;
by year row code ;
run;
proc transpose data=vertical out=want ;
by year row ;
id name ;
var value ;
run;
Obs year row _NAME_ A B
1 1994 1 value 10 30
2 1994 2 value 20 40
Thank you so much. This did exactly what I needed it to do. I am an undergraduate in stats and am still grasping the more advanced portions of SAS, and this has been an extraordinary help. By having the data in this way, I am able to perform the quantile regression analysis I have been asked to do.
data have;
input Year Value Code CodeDescription $;
cards;
1994 10 2844 "A"
1994 20 2844 "A"
1994 30 2855 "B"
1994 40 2855 "B"
;
data want;
set have;
cd=translate(CodeDescription,'','"');
proc transpose data=want out=want11;
by cd;
var value;
run;
proc transpose data=want11 out=want12;
id cd;
var col:;
run;
proc transpose data=want out=want21;
by cd;
var year ;
run;
proc transpose data=want21 out=want22 let;
id _name_;
run;
data final(drop=_name_);
merge want22 want12;
year=ifn(year eq . ,lag1(year),year);
run;
proc print;run;
Linlin
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.