SAS Support Communities

davidsmarch · ‎11-22-2019

Sorry, I just saw this option missing solution, that solves the problem of the . If i could set both of these as solutions I would. Thanks fellas!

davidsmarch · ‎11-22-2019

Thanks, Paige, that was indeed part of the issue! I tried everything, but I indeed needed to define the x_cat as a character before the concatenate step. However, there is now the issue that it is not seeing the . as missing, but instead is doing: 42,52,118,165,206,239,247,249,267,270,272,274,276,278,278,278,281,281,289,289,289,289,289,289,289,289,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,. I thought the CATX ignored blanks. All the x variables are numerical, so I'm not sure why it's seeing these as data points. To your question about the utility of such long strings, this is a particular format of mouse-tracking data for an R package called mouse-trap that requires every X- and Y- path within a single trial to be concatenated into a single cell.

davidsmarch · ‎11-22-2019

I've tried doing this in so many dirty ways, I figured it's time to ask for help. I have 375 x variables that go x1-x375 and and one trial variable that counts up 1-180 in a single column. Each trial can have a different number of x's, anywhere from x1-x25 as the shortest all the way to x1-x375 as the longest. Yet every trial has x1-x375. The x's that don't have data are missing or a . Trial X1 X2 X3 X4 X5 . . . . . . X374 X375 1 22 25 46 48 79 . . I'm just trying to concatenate all the X's together. I tried: x_cat = catx(",", of x:); And it keeps returning an error "Invalid numeric data". I even tried filling in every . with another value then converting it to a character and I still get the same issue. Any help would be very much appreciated.

davidsmarch · ‎11-21-2019

That's fantastic. Super quick process, thanks for the solution! I'm trying to understand how it works and my ability there is limited. I follow about half of what's going on.

davidsmarch · ‎11-17-2019

I saw some previous solutions that don't quite fit, so I though I'd try to see if I could get some help. I have two datasets. One dataset (time.dat) is very long (millions of rows) and contains three variables: subject, column, and time_n in this format: subj col time_n 2 1 1564623266038 2 2 1564623266044 2 3 1564623266058 2 4 1564623266073 2 5 1564623266090 2 6 1564623266106 . . . Each subject has 1000's of columns that each have a unique time_n. There are also many subjects. The second set (onloadtime.dat) has all of the same subjects and three variables, subject, trial, and onloadtime_n in this format: subj trial onloadtime_n 2 1 1564623265779 2 2 1564623272356 2 3 1564623275373 2 4 1564623278728 2 5 1564623282137 Each subject only has 180 onloadtime's. What I need to do is find the closest time_n to each onloadtime_n and to indicate which trial corresponds to that time. Such that the resultant data looks like: subj col time_n onloadtime_n trial 2 1 1564623266038 1564623265779 1 2 2 1564623266044 2 3 1564623266058 2 4 1564623266073 2 5 1564623266090 2 6 1564623266106 . . . 2 450 1564623272324 1564623272356 2 And to repeat this per subject. Essentially I need to know what trial in the shorter set (onloadtime) corresponds to what closest column in the long set (time), and to the indicate what time trial that actually is. I hope that makes sense. I'd love any support and would be happy to answer any questions.

davidsmarch · ‎11-17-2019

I saw some previous solutions that don't quite fit, so I though I'd try to see if I could get some help. I have two datasets. One dataset (time.dat) is very long (millions of rows) and contains three variables: subject, column, and time_n in this format: Obs subj col time_n 1 2 1 1564623266038 2 2 2 1564623266044 3 2 3 1564623266058 4 2 4 1564623266073 5 2 5 1564623266090 6 2 6 1564623266106 Each subject has 1000's of columns that each have a unique time_n. The second set (onloadtime.dat) has all of the same subjects and three variables, subject, trial, and onloadtime_n in this format: Obs subj trial onloadtime_n 1 2 1 1564623265779 2 2 2 1564623272356 3 2 3 1564623275373 4 2 4 1564623278728 5 2 5 1564623282137 Each subject only has 180 onloadtime's. What I need to do is find the closest time_n to each onloadtime_n and to indicate which trial corresponds to that time. Such that the resultant data looks like: Obs subj col time_n onloadtime_n trial 1 2 1 1564623266038 1564623265779 1 2 2 2 1564623266044 3 2 3 1564623266058 4 2 4 1564623266073 5 2 5 1564623266090 6 2 6 1564623266106 . . . 450 2 450 1564623272324 1564623272356 2 And to repeat this per subject. Essentially I need to know what trial in the shorter set (onloadtime) corresponds to what closest column in the long set (time), and to the indicate what time trial that actually is. I hope that makes sense. I'd love any support and would be happy to answer any questions.

davidsmarch · ‎11-17-2019

Apologies. Correct, though the length in each cell is shorter, I saw the process as needing to get the entire string split wide, then transpose to long. But your solution works perfectly!

davidsmarch · ‎11-15-2019

Looks like an interesting solution! I tried to adapt it and and while it's parsing the first column it's not moving past that. The delimiter in my data is actually the letter "a". So a string would look like : 947a946a945a944a943a942a941a940a939a938a937a936a933a1029a926a1119a897a875a850a828a811a798a791a786a786a785 But out to hundreds of thousands of characters. So the string between the a's is never longer than 4. I like the idea of creating a new variable for each value between the a's. I'd then look to transpose that, but that's a later step. The way I altered your code is below: data test; infile 'C:\Users\dsm6713\Google Drive\Academic\y_var.tsv' dsd dlm='a' truncover ; length y1-y200 $4; array y[200]; do i=1 to dim(y) until(loc) ; if i=dim(y) then input y[i] :$char32767. @; else input y[i] $char32767. @; loc = indexc(y[i],'a'); end; if loc then do; y[i]=substrn(y[i],1,loc-1); input +(loc-32767) @ ; end; run;

davidsmarch · ‎11-14-2019

Sorry, but I'm not sure what you mean. I only included that example as it worked for the strings of data that example used. My data is much too large to paste into SAS, so I am trying to import it from a delimited file. My first post is the closest I've gotten, and I included a smaller version of the input data file on that post.

davidsmarch · ‎11-14-2019

Astounding, I just tried that, and same result. It cuts off Ypart1 at 32767 and includes a blank Ypart2 variable. It's not spilling over. The original solution below works, but only as they included the data as parmcards: filename FT15F001 temp; data lines; infile FT15F001 truncover; array part[3] $40.; input (part[*])($char40.); parmcards; this is the data line this is the line this is the data line this is the data line this is the data line this is the data line this is the data line this is the data line this is the data line this is the data line ;;;; run; I'm including this as perhaps it will make more sense to you why theirs works.

davidsmarch · ‎11-14-2019

I'm trying to import multiple variables are each strings and can be more than 32767 a piece. I know the limit for a string variable is 32767, but I'm hoping to import such that if a variable reaches the max it spills over into another variable. I saw a partial fix in another post, but I can't get it to work. For trying it out, I limited the dataset to only 1 variable. To make it easier I also took the header row out o the dataset. This variable is just called Y. The current form is tab-delimited, but I've also unsuccessfully tried other formats. Below I tried to import the Y variable as a possible array with a first variable Ypart1 up to 32767 and let it spill over into a Ypart2 if needed. As it is, it just fills up the first Ypart1 to the limit, makes a Ypart2, but no data appears there. I've attached the limited datafile. I can do this per variable, if necessary. Obviously it'd be best to do it all at once. Thanks for any guidance! filename y_var temp; data y_test; infile 'C:\Users\dsm6713\Google Drive\Academic\y_var.tsv' truncover; array Ypart[2] $32767.; input (Ypart[*])($char32767.); run;

Online Status	Offline
Date Last Visited	‎12-02-2019 01:12 PM

SAS Support Communities

Re: CATX invalid numeric value

Re: CATX invalid numeric value

CATX invalid numeric value

Re: Finding the closest value from two datasets

Finding the closest value from two datasets

Finding the closest value from two datasets

Re: Importing string larger than 32767

Re: Importing string larger than 32767

Re: Importing string larger than 32767

Re: Importing string larger than 32767

Re: CATX invalid numeric value

Re: CATX invalid numeric value

Re: CATX invalid numeric value

CATX invalid numeric value

Re: Finding the closest value from two datasets

Finding the closest value from two datasets

Finding the closest value from two datasets

Re: Importing string larger than 32767

Re: Importing string larger than 32767

Re: Importing string larger than 32767

Re: Importing string larger than 32767

Importing string larger than 32767

Follow Us

What is...