BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
SAS_Question
Quartz | Level 8
  • I use SAS EG 7.1 version 9.4M2

 

  • I have this data set in csv: have.csv (see attached)

 

  • I want this as result I want t achieve:

What i want to achieveWhat i want to achieve

But i'm not there yet.. almost... 😉

 

  • This is my SAS code so far: 
Proc Datasets Lib=Work memtype=Data Kill nolist;
Quit;

options validvarname = V7; 

%let resp =P:\TEMP\;
%let resf =have.csv;

PROC IMPORT 
	datafile="&resp.&resf"
  	out=RES dbms=dlm replace;
  	delimiter=';';   
	getnames=yes; 
	guessingrows=MAX;
	datarow=2;
run;

PROC SQL;
	CREATE table WORK.tempTable AS
		SELECT distinct *
	FROM WORK.RES as T
;
QUIT;

proc sort data=work.tempTable; by NBR ID; run; 

data tries;
   set tempTable;
   BY NBR id;
   if FIRST.ID then
		tries = 0;
		tries + 1;
run;

proc sort data=work.tries; by id tries; run;
 
PROC TRANSPOSE DATA=WORK.TRIES
	OUT=WORK.TRANSPOSED(LABEL="Transposed")
	PREFIX=v 
	DELIMITER=_ 
	NAME=Source
	LABEL=Label 
	LET
;
	BY ID tries;
	ID NBR DESCR1;
	VAR GRADE;
RUN; QUIT;

/* how to change this proc transpose so it gives me what I want like in screenshot? */


Can some one please help me with the correct result? 

TiA! 

 

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

Seems pretty straight forward.

Read the data. Sort. Add the TRY counter so you can generate unique names. And transpose.

 

Your GRADE variable appears to be mainly numeric but has some observations with values like 'O' and 'V'.  You can use the missing option to tell SAS to read those as special missing values of .O and .V.  Or you could read GRADE as character variable.  Whether GRADE is numeric or character does not cahnge the transposing logic, just the type of variables that end up being generated.

missing o v ;

data RES ;
  infile "c:\downloads\have.csv" dsd dlm=';' firstobs=2 truncover ;
  input ID NBR :$20. DESCRIPTION :$8. DATE :ddmmyy. GRADE ;
  format date yymmdd10.;
run;

proc sort data=res;
  by id nbr description date ;
run;

data for_transpose ;
  set res;
  by id nbr description;
  try+1;
  if first.description then try=1;
run;

proc transpose data=for_transpose 
  out=want(drop=_name_)
  prefix=NBR_
  delim=_
;
  by id;
  id nbr description try;
  var grade ;
run;

proc print;
run;
         NBR_00000000P_ NBR_00000000P_ NBR_11111111T_ NBR_22222222S_ NBR_22222222S_ NBR_11111111T_
Obs  ID       BP_1           BP_2           BR_1           BS_1           BS_2           BR_2

 1  2887        4              7              6              O              V              .
 2  2888        7              .              9              V              .              .
 3  2889        9              .              5              V              .              6

View solution in original post

8 REPLIES 8
ballardw
Super User

First thing, if you want a process repeatable with a different data file do not use proc import. There is not guarantee that your variable types or lengths will stay the same. Write a data step and take control of your data.

 

Second, your picture shows column headings that would not be standard SAS variable names. So are those supposed to be the names of the variables or a label for the heading?

 

Third, there really isn't any 'nice' way to have "first" "second" "third" and "fourth" in a sequential name, the digits would be better.

 

If you really want those "headings" I suggest creating heading in the data state as a separate variable and use that as an IDLABEL. You may want a custom format to display 1 as "first' , 2 as 'second' and so on. You could then use

 

labelvar = catx('_',id,put(try, tryformat.),'try');

 

make sure to assign a length long enough to hold the longest expected combination.

SAS_Question
Quartz | Level 8
Thank you for the reply @ballardw
1) I must use import, because the real csv is really big:more than 1 million rows... so, i can't write a data step 🙂
2) I came across that myself too... you can't let a name for a sas variable begin with a number. True! As you can see in my code I already solved that with: PREFIX=v in Proc Transpose. So you get: v11111111111_ etc.
3) The names were just an idea. You may name it whatever you want! As long as the result is clear... what I want. So it can also be 1, 2 and 3 yes of course!
Tom
Super User Tom
Super User

@SAS_Question wrote:
...
1) I must use import, because the real csv is really big:more than 1 million rows... so, i can't write a data step 🙂
...

That statement makes no sense. The complexity of the data step is related to the number of variables, not the number of observations.  Your posted example only has 5 variables.  All you need to do is decide how you want to define those variables and you can do a much better job of making that decision that PROC IMPORT could ever hope to do.  The data step code is shorter than the PROC IMPORT code it replaces.

 

data RES ;
  infile "&resp.&resf" dsd dlm=';' firstobs=2 truncover ;
  input ID NBR :$20. DESCR1 :$8. DATE :ddmmyy. GRADE ;
  format date yymmdd10.;
run;

 

SAS_Question
Quartz | Level 8

Yes you are right. And so is Ballardw. Sorry. I misunderstood his advice!

I thought he meant making a data set like this: 

 

data test;
infile datalines expandtabs;
input (STUDENT_ID COURSE_ID) ($) GRADE_POINTS EXAM_DATE :yymmdd10.;
format EXAM_DATE yymmdd10.;
datalines;
1111 1000T 6 2017-07-05
1111 2000T 7 2017-10-31
2222 1000T 8 2017-10-30
2222 2000T 8 2017-01-10
2222 3000T 8 2017-01-05
2222 4000T 9 2017-03-22
2222 5000T 9 2017-03-20
2222 6000T 8 2017-05-29
2222 7000T 9 2017-06-06
2222 8000T 7 2017-05-31
3333 4000T 9 2015-10-26
4444 1000T 7.5 2015-10-09
4444 2000T 8 2015-10-06
4444 3000T 7 2016-11-23
4444 4000T 7 2015-11-19
4444 5000T 7.5 2015-12-14
4444 6000T 6 2017-03-02
4444 7000T 7 2016-06-10
4444 8000T 6 2016-04-14
4444 8000T 8 2016-05-14
;

But he meant only this: 

infile datalines expandtabs;
input (STUDENT_ID COURSE_ID) ($) GRADE_POINTS EXAM_DATE :yymmdd10.;
format EXAM_DATE yymmdd10.;

without data lines. 

And what the output will be used for... it is for concatenating it with other data of these students and seeing how a student is preforming in a particular group.  

But all I know is, I need to deliver it like that example 🙂 

 

 

ballardw
Super User

@SAS_Question wrote:

Yes you are right. And so is Ballardw. Sorry. I misunderstood his advice!

Juola's Metatheorem: You will seldom misunderstand a problem so as to make it simpler.

 

Dr. Juola was my first professor for statistics.

I call this a metatheorem because it has applied to carpentry, car repairs, sewing and other projects besides statistics or math.

Tom
Super User Tom
Super User

What are you going to do with that dataset once you create it?  That structure does not look very useful for anything.

Tom
Super User Tom
Super User

Seems pretty straight forward.

Read the data. Sort. Add the TRY counter so you can generate unique names. And transpose.

 

Your GRADE variable appears to be mainly numeric but has some observations with values like 'O' and 'V'.  You can use the missing option to tell SAS to read those as special missing values of .O and .V.  Or you could read GRADE as character variable.  Whether GRADE is numeric or character does not cahnge the transposing logic, just the type of variables that end up being generated.

missing o v ;

data RES ;
  infile "c:\downloads\have.csv" dsd dlm=';' firstobs=2 truncover ;
  input ID NBR :$20. DESCRIPTION :$8. DATE :ddmmyy. GRADE ;
  format date yymmdd10.;
run;

proc sort data=res;
  by id nbr description date ;
run;

data for_transpose ;
  set res;
  by id nbr description;
  try+1;
  if first.description then try=1;
run;

proc transpose data=for_transpose 
  out=want(drop=_name_)
  prefix=NBR_
  delim=_
;
  by id;
  id nbr description try;
  var grade ;
run;

proc print;
run;
         NBR_00000000P_ NBR_00000000P_ NBR_11111111T_ NBR_22222222S_ NBR_22222222S_ NBR_11111111T_
Obs  ID       BP_1           BP_2           BR_1           BS_1           BS_2           BR_2

 1  2887        4              7              6              O              V              .
 2  2888        7              .              9              V              .              .
 3  2889        9              .              5              V              .              6

SAS_Question
Quartz | Level 8
Thanks Tom!!! You saved my day with this really simple solution... why am I thinking so hard... 😞

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 784 views
  • 1 like
  • 3 in conversation