Hi Experts,
If you were asked to transfer the attached file into a long form, how would you do it?
there are 18 subjects in the attached file and each subject has three rows of information. There should be 114 observations and 4 variables(NEWID,AAA,BBB,CCC) in the final long form dataset.
NEWID should be 1-18.
Thanks - Linlin
Finally got around doing it :
data have;
array x{16}; /* Increase if needed */
infile "&sasforum.\datasets\sheet555.csv" dsd missover;
input type $ (x{*}) (??);
type = substr(type,1,3);
if not missing(type);
if upcase(type) = "AAA" then newid + 1;
run;
proc transpose data=have out=want(drop=_name_ where=(n(AAA,BBB,CCC)>0));
by newid;
var x:;
id type;
run;
PG
Hey Linlin,
This is the way I'd feel most comfortable reading in the data. However, I get a weird note on the first record. See what you think:
data long_form (keep=ID AAA BBB CCC);
infile "Q:\SAS Community\sheet555.csv" end=done;
length ID $4 AAA BBB CCC 8.;
do until (done=1);
line+1;
input #line @1;
A_Rec = _infile_;
line+1;
input #line @1;
B_Rec = _infile_;
line+1;
input #line @1;
C_Rec = _infile_;
ID=scan(scan(A_rec,1,","),2," ");
do i=2 to countw(A_Rec,",");
AAA=scan(A_rec,i,",");
BBB=scan(B_rec,i,",");
CCC=scan(C_rec,i,",");
if not missing(AAA) then output;
end;
line+1;
end;
run;
There are many other techniques, but using line pointers in this way is how I'd do it.
Hope this helps,
Huey
I don't like mine solution...its too hardcoded and I'm not sure its right, but its where I usually start.
data want;
set test;
retain newid 0;
check=substr(var1, 1, 3);
if check='AAA' then newid+1;
array var(*) var2--var9;
if check='AAA' then do;
do i=1 to dim(var);
AAA=var(i);
output;
end;
end;
else if check='BBB' then do;
do i=1 to dim(var);
BBB=var(i);
output;
end;
end;
else if check='CCC' then do;
do i=1 to dim(var);
CCC=var(i);
output;
end;
end;
run;
data want2;
merge want (where = (check='AAA') keep =newid check i aaa)
want (where=(check='BBB') keep = newid check i bbb)
want (where=(check='CCC') keep=newid check i ccc);
by newid i;
keep newid AAA BBB CCC;
run;
So it looks like you read the csv file in a previous step and then porcess the resultant dataset two more times to get what you want.
Did my method work for you?
Hi Huey and Reeza,
Thank you for your inputs! I used proc import. there is an error message in the log file but the outcome is fine.
data WORK.temp ;
infile 'c:\temp\forum\sheet555.csv' delimiter = ','
MISSOVER DSD lrecl=32767 ;
informat VAR1 $7. ;
informat VAR2 best32. ;
informat VAR3 best32. ;
informat VAR4 best32. ;
informat VAR5 best32. ;
informat VAR6 best32. ;
informat VAR7 best32. ;
informat VAR8 best32. ;
informat VAR9 best32. ;
format VAR1 $7. ;
format VAR2 best12. ;
format VAR3 best12. ;
format VAR4 best12. ;
format VAR5 best12. ;
format VAR6 best12. ;
format VAR7 best12. ;
format VAR8 best12. ;
format VAR9 best12. ;
input
VAR1 $
VAR2
VAR3
VAR4
VAR5
VAR6
VAR7
VAR8
VAR9
;
run;
data wide (drop=var1);
set temp;
newid=round((_n_+1)/4,1);
if missing(var1) then delete;
id=upcase(scan(var1,1));
run;
proc transpose data=wide out=wanted(drop=_:);
by newid;
id id;
var var2-var9;
run;
data long;
set wanted(where=(aaa ne .));
run;
/********* log file ********/
1 data WORK.temp ;
2 infile 'c:\temp\forum\sheet555.csv' delimiter = ','
3 MISSOVER DSD lrecl=32767 ;
4 informat VAR1 $7. ;
5 informat VAR2 best32. ;
6 informat VAR3 best32. ;
7 informat VAR4 best32. ;
8 informat VAR5 best32. ;
9 informat VAR6 best32. ;
10 informat VAR7 best32. ;
11 informat VAR8 best32. ;
12 informat VAR9 best32. ;
13 format VAR1 $7. ;
14 format VAR2 best12. ;
15 format VAR3 best12. ;
16 format VAR4 best12. ;
17 format VAR5 best12. ;
18 format VAR6 best12. ;
19 format VAR7 best12. ;
20 format VAR8 best12. ;
21 format VAR9 best12. ;
22 input
23 VAR1 $
24 VAR2
25 VAR3
26 VAR4
27 VAR5
28 VAR6
29 VAR7
30 VAR8
31 VAR9
32 ;
33
34 run;
NOTE: The infile 'c:\temp\forum\sheet555.csv' is:
Filename=c:\temp\forum\sheet555.csv,
RECFM=V,LRECL=32767,File Size (bytes)=2051,
Last Modified=29Jan2013:15:53:30,
Create Time=29Jan2013:18:00:47
NOTE: Invalid data for VAR4 in line 62 16-16.
RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7---
62 BBB,-4.6,-18.1,X,-4.5,1.3,-9.6,-9.3,-7 38
VAR1=BBB VAR2=-4.6 VAR3=-18.1 VAR4=. VAR5=-4.5 VAR6=1.3 VAR7=-9.6 VAR8=-9.3 VAR9=-7
_ERROR_=1 _N_=62
NOTE: 71 records were read from the infile 'c:\temp\forum\sheet555.csv'.
The minimum record length was 8.
The maximum record length was 53.
NOTE: The data set WORK.TEMP has 71 observations and 9 variables.
NOTE: DATA statement used (Total process time):
real time 0.29 seconds
cpu time 0.03 seconds
35 data wide (drop=var1);
36 set temp;
37 newid=round((_n_+1)/4,1);
38 if missing(var1) then delete;
39 id=upcase(scan(var1,1));
40 run;
NOTE: There were 71 observations read from the data set WORK.TEMP.
NOTE: The data set WORK.WIDE has 54 observations and 10 variables.
NOTE: DATA statement used (Total process time):
real time 0.12 seconds
cpu time 0.04 seconds
41
42 proc transpose data=wide out=wanted(drop=_:);
43 by newid;
44 id id;
45 var var2-var9;
46 run;
NOTE: There were 54 observations read from the data set WORK.WIDE.
NOTE: The data set WORK.WANTED has 144 observations and 4 variables.
NOTE: PROCEDURE TRANSPOSE used (Total process time):
real time 0.22 seconds
cpu time 0.00 seconds
47 data long;
48 set wanted(where=(aaa ne .));
49 run;
NOTE: There were 114 observations read from the data set WORK.WANTED.
WHERE aaa not = .;
NOTE: The data set WORK.LONG has 114 observations and 4 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.00 seconds
Yeah, that 'X' messed up my input method as well. Looks like the data contain "X's" for missing data?
Finally got around doing it :
data have;
array x{16}; /* Increase if needed */
infile "&sasforum.\datasets\sheet555.csv" dsd missover;
input type $ (x{*}) (??);
type = substr(type,1,3);
if not missing(type);
if upcase(type) = "AAA" then newid + 1;
run;
proc transpose data=have out=want(drop=_name_ where=(n(AAA,BBB,CCC)>0));
by newid;
var x:;
id type;
run;
PG
Hi PG,
Thank you very much!!! I am glad I posted the question. Would you please explain more about the double question marks?
input type $ (x{*}) (??);
Linlin
Hi!
From the doc :
The ? modifier suppresses the invalid data message. The ?? modifier also
suppresses the invalid data message and, in addition, prevents the automatic
variable _ERROR_ from being set to 1 when invalid data are read.
PG
Thank you PG! I wonder what caused the invalid message, Maybe some invisible characters in the file?
As hdodson mentioned, there is a letter X on line 62 of the csv file where there should be a number.
Thank you PG! I did not notice the "X".
Good night! - Linlin
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.