My data has 1,201,250 rows, part of the data is as below.
In each form, each examinee take the same 250 item. There are a few same itemID across different forms. e.g. item pp012 in both AAA and BBB, pp012 and pp024 in both BBB and CCC. there are 1350 unique items across all forms. I want to transpose this data back to wide to make a SPARSE matrix in terms of raw. The desired SPARSE matrix should have 1350 columns in terms of raw score. SPARSE matrix means there are no data in a certain column after transposing.
DATA have ;
INPUT form examineeID _NAME_ itemID raw;
CARDS ;
AAA 111 raw_1 pp012 0
AAA 111 raw_2 pp014 1
AAA 111 raw_3 pp017 1
AAA 111 raw_4 pp015 0
..
AAA 444 raw_1 pp012 1
AAA 444 raw_2 pp014 0
AAA 444 raw_3 pp017 1
AAA 444 raw_4 pp015 1
...
BBB 777 raw_1 pp012 0
BBB 777 raw_2 pp024 1
BBB 777 raw_3 pp027 0
BBB 777 raw_4 pp025 1
...
BBB 999 raw_1 pp012 1
BBB 999 raw_2 pp024 1
BBB 999 raw_3 pp027 0
BBB 999 raw_4 pp025 0
....
CCC 666 raw_1 pp012 0
CCC 666 raw_2 pp024 1
CCC 666 raw_3 pp037 0
CCC 666 raw_4 pp035 1
...
CCC 888 raw_1 pp012 1
CCC 888 raw_2 pp024 1
CCC 888 raw_3 pp037 0
CCC 888 raw_4 pp035 0
...
I used the following code
proc sort data=have; by examineeid; run;
proc transpose data=have out=want (drop=_NAME_) delim=_;
by examineeid;
id itemid;
var raw;
run;
The resulted matrix is 4805x1350, which is the correct dimension. But from left to right, the columns SHOULD be in the ascending order of itemID. How can I get that order? I tried to sort "have" data by itemID examineeID before transposing, but the resulted matrix is not the correct dimension I wanted. I am stucked there for a while, much thanks to your help!
Please test your data step code before posting, so you can be sure it works and creates valid data.
Create a pattern dataset of all items, and prepend that before transposing; filter out the empty observation after the transpose:
data have;
input form $ examineeID $ _NAME_ $ itemID $ raw;
datalines;
AAA 111 raw_1 pp012 0
AAA 111 raw_2 pp014 1
AAA 111 raw_3 pp017 1
AAA 111 raw_4 pp015 0
AAA 444 raw_1 pp012 1
AAA 444 raw_2 pp014 0
AAA 444 raw_3 pp017 1
AAA 444 raw_4 pp015 1
BBB 777 raw_1 pp012 0
BBB 777 raw_2 pp024 1
BBB 777 raw_3 pp027 0
BBB 777 raw_4 pp025 1
BBB 999 raw_1 pp012 1
BBB 999 raw_2 pp024 1
BBB 999 raw_3 pp027 0
BBB 999 raw_4 pp025 0
CCC 666 raw_1 pp012 0
CCC 666 raw_2 pp024 1
CCC 666 raw_3 pp037 0
CCC 666 raw_4 pp035 1
CCC 888 raw_1 pp012 1
CCC 888 raw_2 pp024 1
CCC 888 raw_3 pp037 0
CCC 888 raw_4 pp035 0
;
proc sort
data=have (keep=itemid)
out=pattern
nodupkey
;
by itemid;
run;
proc sort data=have;
by examineeid;
run;
data pretrans;
set
pattern
have
;
run;
proc transpose
data=pretrans
out=want (drop=_NAME_ where=(examineeid ne ""))
delim=_
;
by examineeid;
id itemid;
var raw;
run;
Please test your data step code before posting, so you can be sure it works and creates valid data.
Create a pattern dataset of all items, and prepend that before transposing; filter out the empty observation after the transpose:
data have;
input form $ examineeID $ _NAME_ $ itemID $ raw;
datalines;
AAA 111 raw_1 pp012 0
AAA 111 raw_2 pp014 1
AAA 111 raw_3 pp017 1
AAA 111 raw_4 pp015 0
AAA 444 raw_1 pp012 1
AAA 444 raw_2 pp014 0
AAA 444 raw_3 pp017 1
AAA 444 raw_4 pp015 1
BBB 777 raw_1 pp012 0
BBB 777 raw_2 pp024 1
BBB 777 raw_3 pp027 0
BBB 777 raw_4 pp025 1
BBB 999 raw_1 pp012 1
BBB 999 raw_2 pp024 1
BBB 999 raw_3 pp027 0
BBB 999 raw_4 pp025 0
CCC 666 raw_1 pp012 0
CCC 666 raw_2 pp024 1
CCC 666 raw_3 pp037 0
CCC 666 raw_4 pp035 1
CCC 888 raw_1 pp012 1
CCC 888 raw_2 pp024 1
CCC 888 raw_3 pp037 0
CCC 888 raw_4 pp035 0
;
proc sort
data=have (keep=itemid)
out=pattern
nodupkey
;
by itemid;
run;
proc sort data=have;
by examineeid;
run;
data pretrans;
set
pattern
have
;
run;
proc transpose
data=pretrans
out=want (drop=_NAME_ where=(examineeid ne ""))
delim=_
;
by examineeid;
id itemid;
var raw;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.