BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Stu
Calcite | Level 5 Stu
Calcite | Level 5

I have an extremely large csv dataset (in the millions) that is    formatted in an odd way.  There is only one column for variable    values, and another column with a code for what each value is    measuring.  So, for example, if I had a small dataset with two    variables A and B that was formatted the normal way:
   
    Year     A       B
    1994   10     30
    1994   20     40

    In the scheme I was given, it would be formatted in this way:
   
    Year   Value     Code     CodeDescription
    1994   10         2844                  "A"
    1994   20         2844                  "A"
    1994   30         2855                  "B"
    1994   40         2855                  "B"

    Using some simple "if" statements, I can read in the data in this    way to separate it all out:
   
    if(Code = 2844) then A = Value
    if(Code = 2855) then B = Value
   
    When I read in the data and sort it by year, it will come out in    this way:
   
    Year        A    B
    1994      10    .
    1994      20    .
    1994        .    30
    1994        .    40

    Even if I sort it by date, it will come out this way.  I would like    it to be formatted as such:
   
    Year       A     B
    1994     10    30
    1994     20    40

I've tried a bunch of different options but cannot seem to get it to sort properly.  Some of the variables have more measurements than others, and my goal is to have it aligned such that any missing variables will be at the end of the date sets, and the data can be aligned as best as it can be.  I have found that, due to the way the data is originally formatted, SAS will treat each read as an individual observation, and set any other variables to missing.
   

Does anyone know of a way I can do this?  I have tried everything from exporting it and re-reading it in to transposing twice to see if that would make any sort of difference.

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

It is not clear to me that it is valid to do what you want.  Why did you put A=10 and B=30 in the same row instead of A=20 and B=30?

If we assume that the values of the same variable for the same year can be matched in the order that they appear in the source file then we just need to introduce a new variable to let PROC TRANSPOSE do the work for us.

data have ;

  length name $34 ;

  input year value code name $ ;

  name=dequote(name);

cards;

1994 10 2844 "A"

1994 20 2844 "A"

1994 30 2855 "B"

1994 40 2855 "B"

run;

data vertical ;

  set have ;

  by year code ;

  if first.code then row=1;

  else row+1;

run;

proc sort ;

  by year row code ;

run;

proc transpose data=vertical out=want ;

  by year row ;

  id name ;

  var value ;

run;

Obs    year    row    _NAME_     A     B

1     1994     1     value     10    30

2     1994     2     value     20    40

View solution in original post

3 REPLIES 3
Tom
Super User Tom
Super User

It is not clear to me that it is valid to do what you want.  Why did you put A=10 and B=30 in the same row instead of A=20 and B=30?

If we assume that the values of the same variable for the same year can be matched in the order that they appear in the source file then we just need to introduce a new variable to let PROC TRANSPOSE do the work for us.

data have ;

  length name $34 ;

  input year value code name $ ;

  name=dequote(name);

cards;

1994 10 2844 "A"

1994 20 2844 "A"

1994 30 2855 "B"

1994 40 2855 "B"

run;

data vertical ;

  set have ;

  by year code ;

  if first.code then row=1;

  else row+1;

run;

proc sort ;

  by year row code ;

run;

proc transpose data=vertical out=want ;

  by year row ;

  id name ;

  var value ;

run;

Obs    year    row    _NAME_     A     B

1     1994     1     value     10    30

2     1994     2     value     20    40

Stu
Calcite | Level 5 Stu
Calcite | Level 5

Thank you so much. This did exactly what I needed it to do.  I am an undergraduate in stats and am still grasping the more advanced portions of SAS, and this has been an extraordinary help.  By having the data in this way, I am able to perform the quantile regression analysis I have been asked to do.

Linlin
Lapis Lazuli | Level 10

data have;

input Year   Value     Code     CodeDescription $;

cards;

1994   10         2844                  "A"

1994   20         2844                  "A"

1994   30         2855                  "B"

1994   40         2855                  "B"

;

data want;

  set have;

  cd=translate(CodeDescription,'','"');

proc transpose data=want out=want11;

by cd;

var value;

run;

proc transpose data=want11 out=want12;

id cd;

var col:;

run;

proc transpose data=want out=want21;

by cd;

var year ;

run;

proc transpose data=want21 out=want22 let;

id _name_;

run;

data final(drop=_name_);

merge want22 want12;

year=ifn(year eq . ,lag1(year),year);

run;

proc print;run;

Linlin

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 791 views
  • 4 likes
  • 3 in conversation