Data Society – Home › Forums › Clustering and finding patterns › Slide 106 causes "undefined columns selected" error
- This topic has 6 replies, 2 voices, and was last updated 4 years, 6 months ago by
Merav Yuravlivker.
-
AuthorPosts
-
October 9, 2016 at 6:02 pm EDT #18106
JayC
ParticipantOn slide 106, the clust_data_NBA call shown in the slide causes an error.
the first error is easy to fix because the slide uses “STLPG” rather than “STPG”. However, once that is corrected, the command still refuses to run, and I’ve no idea why. I’ve gone over the column names multiple times but still can’t get past this step.
My code:
clust.data.NBA <- NBA[, c(“MGP”, “FG_PC”, “X3P_PC”, “FT_PC”,
“APG”, “PPG”, “STPG”, “BLKPG”)]the error:
Error in[.data.frame
(NBA, , c(“MGP”, “FG_PC”, “3P_PC”, “FT_PC”, “APG”, :
undefined columns selectedNBA.csv has an APG column and this code is basically identical to the slide.
October 9, 2016 at 6:22 pm EDT #18107Merav Yuravlivker
KeymasterHi JayC,
Thanks for your comment! Usually this error “undefined columns selected” means that your code is trying to access a column that doesn’t exist. It looks like your code has a column “X3P_PC”, and the error message has a column name “3P_PC”. I recommend to recheck your column names, as it may be that there is a typo in one of them. You can always use the code “colnames(NBA)” to see a list of all the column names in the console and then copy and paste the names into your code.
There are some other ways to subset the columns, such as identifying the column numbers instead of names or using the subset function. You can always test those out to see if they work.
Let me know if that helps!
Best,
MeravOctober 9, 2016 at 6:42 pm EDT #18110JayC
ParticipantThe colnames command produces:
> colnames(NBA)
[1] “NAME” “TEAM” “SALARY.M.” “GP” “MPG”
[6] “PPG” “FG_PC” “X3P_PC” “FT_PC” “RPG”
[11] “APG” “STPG” “BLKPG” “POSITION”Note that the file NBA.csv has a column titled “3P_PC”, and R doesn’t allow columns to start with a number (so it adds the “X”). This was covered in the data viz course I believe.
My point is that the code on your slide does not work and I do not understand why.
October 9, 2016 at 7:02 pm EDT #18112Merav Yuravlivker
KeymasterHi JayC,
I just ran the code
NBA = read.csv(“NBA.csv”)
clust_data_NBA = NBA[, c(“MPG”, “PPG”, “FG_PC”, “X3P_PC”,
“FT_PC”, “RPG”, “APG”, “STPG”, “BLKPG”)]I also ran the slide from the code (as copied below):
clust_data_NBA = NBA[, c(“MPG”, “FG_PC”, “X3P_PC”, “FT_PC”,
“APG”, “PPG”, “STPG”, “BLKPG”)]Both of these variations worked when I ran them. I recommend restarting R and RStudio, as this tends to solve problems like this, where the code does not seem to be working correctly.
Let me know if that worked!
Best,
MeravOctober 10, 2016 at 10:22 pm EDT #18231JayC
ParticipantSo the restart didn’t work, and neither did the copy/paste of your code into my code.
But I basically rebuilt the entire expression one variable at a time and for some mysterious reason it worked.
Weird are the ways of R.
October 10, 2016 at 10:33 pm EDT #18232JayC
ParticipantAlso, for some reason my code is spitting out that the best number of clusters is 3, not 2.
October 11, 2016 at 9:50 am EDT #18233Merav Yuravlivker
KeymasterHi JayC,
You are correct in that R can work in some weird ways! And for the NBA data set, I believe that we state the best number of clusters in 3, although we go over what kmeans looks like with only 2 centers to illustrate the differences.
Let me know if you have any other questions!
Best,
Merav -
AuthorPosts
- You must be logged in to reply to this topic.