The Big R-Book. Philippe J. S. De Brouwer
Чтение книги онлайн.
Читать онлайн книгу The Big R-Book - Philippe J. S. De Brouwer страница 41
![The Big R-Book - Philippe J. S. De Brouwer The Big R-Book - Philippe J. S. De Brouwer](/cover_pre848614.jpg)
Figure 4.6: The histogram of the most recent values of the CAD only.
Without the row.names = FALSE
statement, the function write.csv()
would add a row that will get the name “X.”
4.8.2 Excel Files
Microsoft's Excel is omnipresent in the corporate environment and many people will have some data in that format. There is no need to to first save the data as a CSV file and then upload in R. The package xlsx
will allow us to directly import a file in xlsx format.
Excel
.xlx
Importing and xlsx-file is very similar to importing a CSV-file.
# install the package xlsx if not yet done if (!any(grepl(“xlsx”,installed.packages()))){ install.packages(“xlsx”)} library(xlsx) data <- read.xlsx(“input.xlsx”, sheetIndex = 1)
4.8.3 Databases
Spreadsheets and CSV-file are good carriers for reasonably small datasets. Any company that holds a lot of data will use a database system – see for example Chapter 13 “RDBMS” on page 285. Importing data from a database system is somewhat different. The data is usually structured in “tables” (logical units of rectangular data); however, they will seldom contain all information that we need and usually have too much rows.We need also some protocol to communicate with the database: that is the role of the structured query language (SQL) – see Chapter 14 “SQL on page 291.
database – MySQL
R can connect to many popular database systems. For example, MySQL: as usual there is a package that will provide this functionality.
MySQL
if(!any(grepl(“xls”, installed.packages()))){ install.packages(“RMySQL”)} library(RMySQL)
RMySQL
Connecting to the Database
This is explained in more detail in Chapter 15 “Connecting R to an SQL Database” on page 327, but the code segment below will already get you the essential ingredients.
# The connection is stored in an R object myConnection and # it needs the database name (db_name), username and password myConnection = dbConnect(MySQL(), user = ‘root’, password = ‘xxx’, dbname = ‘db_name’, host = ‘localhost’) # e.g. list the tables available in this database. dbListTables(myConnection)
dbConnect()
dbSendQuery()
Fetching Data Drom a Database
Once connected to the database, we can select and download the data that we will use for our analysis in R.
# Prepare the query for the database result <- dbSendQuery(myConnection, “SELECT * from tbl_students WHERE age > 33”) # fetch() will get us the results, it takes a parameter n, which # is the number of desired records. # Fetch all the records(with n = -1) and store it in a data frame. data <- fetch(result, n = -1)
Update Queries
It is also possible to manipulate data in the database directly from R. This allows use to prepare data in the database first and then download it and do our analysis, while we keep all the code in one file.
fetch()
The dbSendQuery()
function can be used to send any query, including UPDATE, INSERT, CREATE TABLE and DROP TABLE queries so we can push results back to the database.
sSQL = “” sSQL[1] <- “UPDATE tbl_students SET score = ‘A’ WHERE raw_score > 90;” sSQL[2] <- “INSERT INTO tbl_students (name, class, score, raw_score) VALUES (‘Robert’, ‘Grade 0’, 88,NULL);” sSQL[3] <- “DROP TABLE IF EXISTS tbl_students;” for (k in c(1:3)){ dbSendQuery(myConnection, sSQL[k]) }
dbSendQuery()
Create Tables from R Data-frames
Finally, it is possible to write back data from a dataset in R to a table in a database.
dbWriteTable(myConnection, “tbl_name”, data_frame_name[, ], overwrite = TRUE)
dbWriteTable()
Even while connections will be automatically closed when the scope of the database object is lost, it is a good idea to close a connection explicitly. Closing a connection explicitly, makes sure that it is closed and does not remain open if something went wrong, and hence the resources of our RDBMS are freed. Closing a database connection can be done with dbDisconnect(myConnection, …)
.
dbDisconnect()
Notes
1 1 There are different package management systems for different flavours of Linux and discussing them all is not only beyond the scope of this book, but