BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Linlin
Lapis Lazuli | Level 10

Hi Experts,

If you were asked to transfer the attached file into a long form, how would you do it?

there are 18 subjects in the attached file and each subject has three rows of information. There should be 114 observations and 4 variables(NEWID,AAA,BBB,CCC) in the final long form dataset.

NEWID should be 1-18.

Thanks - Linlin

1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

Finally got around doing it :

data have;
array x{16}; /* Increase if needed */
infile "&sasforum.\datasets\sheet555.csv" dsd missover;
input type $ (x{*}) (??);
type = substr(type,1,3);
if not missing(type);
if upcase(type) = "AAA" then newid + 1;
run;

proc transpose data=have out=want(drop=_name_ where=(n(AAA,BBB,CCC)>0));
by newid;
var x:;
id type;
run;

PG

PG

View solution in original post

11 REPLIES 11
hdodson_pacificmetrics_com
Calcite | Level 5


Hey Linlin,

This is the way I'd feel most comfortable reading in the data. However, I get a weird note on the first record. See what you think:

data long_form (keep=ID AAA BBB CCC);
     infile "Q:\SAS Community\sheet555.csv" end=done;

     length ID $4 AAA BBB CCC 8.;
     do until (done=1);
          line+1;
          input #line @1;
          A_Rec = _infile_;

          line+1;
          input #line @1;
          B_Rec = _infile_;

          line+1;
          input #line @1;
          C_Rec = _infile_;


          ID=scan(scan(A_rec,1,","),2," ");
          do i=2 to countw(A_Rec,",");
               AAA=scan(A_rec,i,",");
               BBB=scan(B_rec,i,",");
               CCC=scan(C_rec,i,",");
               if not missing(AAA) then output;
          end;

          line+1;
end;
run;

There are many other techniques, but using line pointers in this way is how I'd do it.

Hope this helps,

Huey

Reeza
Super User

I don't like mine solution...its too hardcoded and I'm not sure its right, but its where I usually start.

data want;

    set test;

    retain newid 0;

    check=substr(var1, 1, 3);

    if check='AAA' then newid+1;

    array var(*) var2--var9;

    if check='AAA' then do;

        do i=1 to dim(var);

        AAA=var(i);

        output;

        end;

    end;

    else if check='BBB' then do;

        do i=1 to dim(var);

            BBB=var(i);

            output;

        end;

    end;

    else if check='CCC' then do;

        do i=1 to dim(var);

            CCC=var(i);

            output;

        end;

    end;

run;

data want2;

    merge want (where = (check='AAA') keep =newid check i aaa)

            want (where=(check='BBB') keep = newid check i bbb)

            want (where=(check='CCC') keep=newid check  i ccc);

    by newid i;

    keep newid AAA BBB CCC;

run;

hdodson_pacificmetrics_com
Calcite | Level 5

So it looks like you read the csv file in a previous step and then porcess the resultant dataset two more times to get what you want.

Did my method work for you?

Linlin
Lapis Lazuli | Level 10

Hi Huey and Reeza,

Thank you for your inputs! I used proc import. there is an error message in the log file but the outcome is fine.

data WORK.temp    ;

     infile 'c:\temp\forum\sheet555.csv' delimiter = ','

MISSOVER DSD lrecl=32767 ;

        informat VAR1 $7. ;

        informat VAR2 best32. ;

         informat VAR3 best32. ;

         informat VAR4 best32. ;

         informat VAR5 best32. ;

         informat VAR6 best32. ;

         informat VAR7 best32. ;

        informat VAR8 best32. ;

         informat VAR9 best32. ;

         format VAR1 $7. ;

        format VAR2 best12. ;

         format VAR3 best12. ;

        format VAR4 best12. ;

        format VAR5 best12. ;

        format VAR6 best12. ;

        format VAR7 best12. ;

        format VAR8 best12. ;

         format VAR9 best12. ;

      input

                  VAR1 $

                  VAR2

                 VAR3

                  VAR4

                  VAR5

                  VAR6

                  VAR7

                 VAR8

                  VAR9

      ;

  

     run;

data wide (drop=var1);

set temp;

newid=round((_n_+1)/4,1);

if missing(var1) then delete;

id=upcase(scan(var1,1));

run;

proc transpose data=wide out=wanted(drop=_:);

   by newid;

   id id;

   var var2-var9;

run;

data long;

   set wanted(where=(aaa ne .));

run;

/********* log file ********/

1    data WORK.temp    ;

2         infile 'c:\temp\forum\sheet555.csv' delimiter = ','

3    MISSOVER DSD lrecl=32767 ;

4            informat VAR1 $7. ;

5            informat VAR2 best32. ;

6             informat VAR3 best32. ;

7             informat VAR4 best32. ;

8             informat VAR5 best32. ;

9             informat VAR6 best32. ;

10            informat VAR7 best32. ;

11           informat VAR8 best32. ;

12            informat VAR9 best32. ;

13            format VAR1 $7. ;

14           format VAR2 best12. ;

15            format VAR3 best12. ;

16           format VAR4 best12. ;

17           format VAR5 best12. ;

18           format VAR6 best12. ;

19           format VAR7 best12. ;

20           format VAR8 best12. ;

21            format VAR9 best12. ;

22         input

23                     VAR1 $

24                     VAR2

25                    VAR3

26                     VAR4

27                     VAR5

28                     VAR6

29                     VAR7

30                    VAR8

31                     VAR9

32         ;

33

34        run;

NOTE: The infile 'c:\temp\forum\sheet555.csv' is:

      Filename=c:\temp\forum\sheet555.csv,

      RECFM=V,LRECL=32767,File Size (bytes)=2051,

      Last Modified=29Jan2013:15:53:30,

      Create Time=29Jan2013:18:00:47

NOTE: Invalid data for VAR4 in line 62 16-16.

RULE:     ----+----1----+----2----+----3----+----4----+----5----+----6----+----7---

62        BBB,-4.6,-18.1,X,-4.5,1.3,-9.6,-9.3,-7 38

VAR1=BBB VAR2=-4.6 VAR3=-18.1 VAR4=. VAR5=-4.5 VAR6=1.3 VAR7=-9.6 VAR8=-9.3 VAR9=-7

_ERROR_=1 _N_=62

NOTE: 71 records were read from the infile 'c:\temp\forum\sheet555.csv'.

      The minimum record length was 8.

      The maximum record length was 53.

NOTE: The data set WORK.TEMP has 71 observations and 9 variables.

NOTE: DATA statement used (Total process time):

      real time           0.29 seconds

      cpu time            0.03 seconds

35   data wide (drop=var1);

36   set temp;

37   newid=round((_n_+1)/4,1);

38   if missing(var1) then delete;

39   id=upcase(scan(var1,1));

40   run;

NOTE: There were 71 observations read from the data set WORK.TEMP.

NOTE: The data set WORK.WIDE has 54 observations and 10 variables.

NOTE: DATA statement used (Total process time):

      real time           0.12 seconds

      cpu time            0.04 seconds

41

42   proc transpose data=wide out=wanted(drop=_:);

43      by newid;

44      id id;

45      var var2-var9;

46   run;

NOTE: There were 54 observations read from the data set WORK.WIDE.

NOTE: The data set WORK.WANTED has 144 observations and 4 variables.

NOTE: PROCEDURE TRANSPOSE used (Total process time):

      real time           0.22 seconds

      cpu time            0.00 seconds

47   data long;

48      set wanted(where=(aaa ne .));

49   run;

NOTE: There were 114 observations read from the data set WORK.WANTED.

      WHERE aaa not = .;

NOTE: The data set WORK.LONG has 114 observations and 4 variables.

NOTE: DATA statement used (Total process time):

      real time           0.01 seconds

      cpu time            0.00 seconds

hdodson_pacificmetrics_com
Calcite | Level 5

Yeah, that 'X' messed up my input method as well. Looks like the data contain "X's" for missing data?

PGStats
Opal | Level 21

Finally got around doing it :

data have;
array x{16}; /* Increase if needed */
infile "&sasforum.\datasets\sheet555.csv" dsd missover;
input type $ (x{*}) (??);
type = substr(type,1,3);
if not missing(type);
if upcase(type) = "AAA" then newid + 1;
run;

proc transpose data=have out=want(drop=_name_ where=(n(AAA,BBB,CCC)>0));
by newid;
var x:;
id type;
run;

PG

PG
Linlin
Lapis Lazuli | Level 10

Hi PG,

Thank you very much!!!  I am glad I posted the question. Would you please explain more about the double question marks?

input type $ (x{*}) (??);


Linlin

PGStats
Opal | Level 21

Hi!

From the doc :

The ? modifier suppresses the invalid data message. The ?? modifier also

suppresses the invalid data message and, in addition, prevents the automatic

variable _ERROR_ from being set to 1 when invalid data are read.

PG


PG
Linlin
Lapis Lazuli | Level 10

Thank you PG! I wonder what caused the invalid message, Maybe some invisible characters in the file?

PGStats
Opal | Level 21

As hdodson mentioned, there is a letter X on line 62 of the csv file where there should be a number.

PG
Linlin
Lapis Lazuli | Level 10

Thank you PG! I did not notice the "X".

Good night! - Linlin

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 1503 views
  • 7 likes
  • 4 in conversation