But i'm not there yet.. almost... 😉
Proc Datasets Lib=Work memtype=Data Kill nolist;
Quit;
options validvarname = V7;
%let resp =P:\TEMP\;
%let resf =have.csv;
PROC IMPORT
datafile="&resp.&resf"
out=RES dbms=dlm replace;
delimiter=';';
getnames=yes;
guessingrows=MAX;
datarow=2;
run;
PROC SQL;
CREATE table WORK.tempTable AS
SELECT distinct *
FROM WORK.RES as T
;
QUIT;
proc sort data=work.tempTable; by NBR ID; run;
data tries;
set tempTable;
BY NBR id;
if FIRST.ID then
tries = 0;
tries + 1;
run;
proc sort data=work.tries; by id tries; run;
PROC TRANSPOSE DATA=WORK.TRIES
OUT=WORK.TRANSPOSED(LABEL="Transposed")
PREFIX=v
DELIMITER=_
NAME=Source
LABEL=Label
LET
;
BY ID tries;
ID NBR DESCR1;
VAR GRADE;
RUN; QUIT;
/* how to change this proc transpose so it gives me what I want like in screenshot? */
Can some one please help me with the correct result?
TiA!
Seems pretty straight forward.
Read the data. Sort. Add the TRY counter so you can generate unique names. And transpose.
Your GRADE variable appears to be mainly numeric but has some observations with values like 'O' and 'V'. You can use the missing option to tell SAS to read those as special missing values of .O and .V. Or you could read GRADE as character variable. Whether GRADE is numeric or character does not cahnge the transposing logic, just the type of variables that end up being generated.
missing o v ;
data RES ;
infile "c:\downloads\have.csv" dsd dlm=';' firstobs=2 truncover ;
input ID NBR :$20. DESCRIPTION :$8. DATE :ddmmyy. GRADE ;
format date yymmdd10.;
run;
proc sort data=res;
by id nbr description date ;
run;
data for_transpose ;
set res;
by id nbr description;
try+1;
if first.description then try=1;
run;
proc transpose data=for_transpose
out=want(drop=_name_)
prefix=NBR_
delim=_
;
by id;
id nbr description try;
var grade ;
run;
proc print;
run;
NBR_00000000P_ NBR_00000000P_ NBR_11111111T_ NBR_22222222S_ NBR_22222222S_ NBR_11111111T_ Obs ID BP_1 BP_2 BR_1 BS_1 BS_2 BR_2 1 2887 4 7 6 O V . 2 2888 7 . 9 V . . 3 2889 9 . 5 V . 6
First thing, if you want a process repeatable with a different data file do not use proc import. There is not guarantee that your variable types or lengths will stay the same. Write a data step and take control of your data.
Second, your picture shows column headings that would not be standard SAS variable names. So are those supposed to be the names of the variables or a label for the heading?
Third, there really isn't any 'nice' way to have "first" "second" "third" and "fourth" in a sequential name, the digits would be better.
If you really want those "headings" I suggest creating heading in the data state as a separate variable and use that as an IDLABEL. You may want a custom format to display 1 as "first' , 2 as 'second' and so on. You could then use
labelvar = catx('_',id,put(try, tryformat.),'try');
make sure to assign a length long enough to hold the longest expected combination.
@SAS_Question wrote:
...
1) I must use import, because the real csv is really big:more than 1 million rows... so, i can't write a data step 🙂
...
That statement makes no sense. The complexity of the data step is related to the number of variables, not the number of observations. Your posted example only has 5 variables. All you need to do is decide how you want to define those variables and you can do a much better job of making that decision that PROC IMPORT could ever hope to do. The data step code is shorter than the PROC IMPORT code it replaces.
data RES ;
infile "&resp.&resf" dsd dlm=';' firstobs=2 truncover ;
input ID NBR :$20. DESCR1 :$8. DATE :ddmmyy. GRADE ;
format date yymmdd10.;
run;
Yes you are right. And so is Ballardw. Sorry. I misunderstood his advice!
I thought he meant making a data set like this:
data test; infile datalines expandtabs; input (STUDENT_ID COURSE_ID) ($) GRADE_POINTS EXAM_DATE :yymmdd10.; format EXAM_DATE yymmdd10.; datalines; 1111 1000T 6 2017-07-05 1111 2000T 7 2017-10-31 2222 1000T 8 2017-10-30 2222 2000T 8 2017-01-10 2222 3000T 8 2017-01-05 2222 4000T 9 2017-03-22 2222 5000T 9 2017-03-20 2222 6000T 8 2017-05-29 2222 7000T 9 2017-06-06 2222 8000T 7 2017-05-31 3333 4000T 9 2015-10-26 4444 1000T 7.5 2015-10-09 4444 2000T 8 2015-10-06 4444 3000T 7 2016-11-23 4444 4000T 7 2015-11-19 4444 5000T 7.5 2015-12-14 4444 6000T 6 2017-03-02 4444 7000T 7 2016-06-10 4444 8000T 6 2016-04-14 4444 8000T 8 2016-05-14 ;
But he meant only this:
infile datalines expandtabs; input (STUDENT_ID COURSE_ID) ($) GRADE_POINTS EXAM_DATE :yymmdd10.; format EXAM_DATE yymmdd10.;
without data lines.
And what the output will be used for... it is for concatenating it with other data of these students and seeing how a student is preforming in a particular group.
But all I know is, I need to deliver it like that example 🙂
@SAS_Question wrote:
Yes you are right. And so is Ballardw. Sorry. I misunderstood his advice!
Juola's Metatheorem: You will seldom misunderstand a problem so as to make it simpler.
Dr. Juola was my first professor for statistics.
I call this a metatheorem because it has applied to carpentry, car repairs, sewing and other projects besides statistics or math.
What are you going to do with that dataset once you create it? That structure does not look very useful for anything.
Seems pretty straight forward.
Read the data. Sort. Add the TRY counter so you can generate unique names. And transpose.
Your GRADE variable appears to be mainly numeric but has some observations with values like 'O' and 'V'. You can use the missing option to tell SAS to read those as special missing values of .O and .V. Or you could read GRADE as character variable. Whether GRADE is numeric or character does not cahnge the transposing logic, just the type of variables that end up being generated.
missing o v ;
data RES ;
infile "c:\downloads\have.csv" dsd dlm=';' firstobs=2 truncover ;
input ID NBR :$20. DESCRIPTION :$8. DATE :ddmmyy. GRADE ;
format date yymmdd10.;
run;
proc sort data=res;
by id nbr description date ;
run;
data for_transpose ;
set res;
by id nbr description;
try+1;
if first.description then try=1;
run;
proc transpose data=for_transpose
out=want(drop=_name_)
prefix=NBR_
delim=_
;
by id;
id nbr description try;
var grade ;
run;
proc print;
run;
NBR_00000000P_ NBR_00000000P_ NBR_11111111T_ NBR_22222222S_ NBR_22222222S_ NBR_11111111T_ Obs ID BP_1 BP_2 BR_1 BS_1 BS_2 BR_2 1 2887 4 7 6 O V . 2 2888 7 . 9 V . . 3 2889 9 . 5 V . 6
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.