My data has 1,201,250 rows, part of the data is as below.
In each form, each examinee take the same 250 item. There are a few same itemID across different forms. e.g. item pp012 in both AAA and BBB, pp012 and pp024 in both BBB and CCC. there are 1350 unique items across all forms. I want to transpose this data back to wide to make a SPARSE matrix in terms of raw. The desired SPARSE matrix should have 1350 columns in terms of raw score. SPARSE matrix means there are no data in a certain column after transposing.
DATA have ;
INPUT form examineeID _NAME_ itemID raw;
CARDS ;
AAA 111 raw_1 pp012 0
AAA 111 raw_2 pp014 1
AAA 111 raw_3 pp017 1
AAA 111 raw_4 pp015 0
..
AAA 444 raw_1 pp012 1
AAA 444 raw_2 pp014 0
AAA 444 raw_3 pp017 1
AAA 444 raw_4 pp015 1
...
BBB 777 raw_1 pp012 0
BBB 777 raw_2 pp024 1
BBB 777 raw_3 pp027 0
BBB 777 raw_4 pp025 1
...
BBB 999 raw_1 pp012 1
BBB 999 raw_2 pp024 1
BBB 999 raw_3 pp027 0
BBB 999 raw_4 pp025 0
....
CCC 666 raw_1 pp012 0
CCC 666 raw_2 pp024 1
CCC 666 raw_3 pp037 0
CCC 666 raw_4 pp035 1
...
CCC 888 raw_1 pp012 1
CCC 888 raw_2 pp024 1
CCC 888 raw_3 pp037 0
CCC 888 raw_4 pp035 0
...
I used the following code
proc sort data=have; by examineeid; run;
proc transpose data=have out=want (drop=_NAME_) delim=_;
by examineeid;
id itemid;
var raw;
run;
The resulted matrix is 4805x1350, which is the correct dimension. But from left to right, the columns SHOULD be in the ascending order of itemID. How can I get that order? I tried to sort "have" data by itemID examineeID before transposing, but the resulted matrix is not the correct dimension I wanted. I am stucked there for a while, much thanks to your help!
Please test your data step code before posting, so you can be sure it works and creates valid data.
Create a pattern dataset of all items, and prepend that before transposing; filter out the empty observation after the transpose:
data have;
input form $ examineeID $ _NAME_ $ itemID $ raw;
datalines;
AAA 111 raw_1 pp012 0
AAA 111 raw_2 pp014 1
AAA 111 raw_3 pp017 1
AAA 111 raw_4 pp015 0
AAA 444 raw_1 pp012 1
AAA 444 raw_2 pp014 0
AAA 444 raw_3 pp017 1
AAA 444 raw_4 pp015 1
BBB 777 raw_1 pp012 0
BBB 777 raw_2 pp024 1
BBB 777 raw_3 pp027 0
BBB 777 raw_4 pp025 1
BBB 999 raw_1 pp012 1
BBB 999 raw_2 pp024 1
BBB 999 raw_3 pp027 0
BBB 999 raw_4 pp025 0
CCC 666 raw_1 pp012 0
CCC 666 raw_2 pp024 1
CCC 666 raw_3 pp037 0
CCC 666 raw_4 pp035 1
CCC 888 raw_1 pp012 1
CCC 888 raw_2 pp024 1
CCC 888 raw_3 pp037 0
CCC 888 raw_4 pp035 0
;
proc sort
data=have (keep=itemid)
out=pattern
nodupkey
;
by itemid;
run;
proc sort data=have;
by examineeid;
run;
data pretrans;
set
pattern
have
;
run;
proc transpose
data=pretrans
out=want (drop=_NAME_ where=(examineeid ne ""))
delim=_
;
by examineeid;
id itemid;
var raw;
run;
Please test your data step code before posting, so you can be sure it works and creates valid data.
Create a pattern dataset of all items, and prepend that before transposing; filter out the empty observation after the transpose:
data have;
input form $ examineeID $ _NAME_ $ itemID $ raw;
datalines;
AAA 111 raw_1 pp012 0
AAA 111 raw_2 pp014 1
AAA 111 raw_3 pp017 1
AAA 111 raw_4 pp015 0
AAA 444 raw_1 pp012 1
AAA 444 raw_2 pp014 0
AAA 444 raw_3 pp017 1
AAA 444 raw_4 pp015 1
BBB 777 raw_1 pp012 0
BBB 777 raw_2 pp024 1
BBB 777 raw_3 pp027 0
BBB 777 raw_4 pp025 1
BBB 999 raw_1 pp012 1
BBB 999 raw_2 pp024 1
BBB 999 raw_3 pp027 0
BBB 999 raw_4 pp025 0
CCC 666 raw_1 pp012 0
CCC 666 raw_2 pp024 1
CCC 666 raw_3 pp037 0
CCC 666 raw_4 pp035 1
CCC 888 raw_1 pp012 1
CCC 888 raw_2 pp024 1
CCC 888 raw_3 pp037 0
CCC 888 raw_4 pp035 0
;
proc sort
data=have (keep=itemid)
out=pattern
nodupkey
;
by itemid;
run;
proc sort data=have;
by examineeid;
run;
data pretrans;
set
pattern
have
;
run;
proc transpose
data=pretrans
out=want (drop=_NAME_ where=(examineeid ne ""))
delim=_
;
by examineeid;
id itemid;
var raw;
run;
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.