Togaware DATA MINING
Desktop Survival Guide
by Graham Williams
Google

.

R is a statistical and data mining package consisting of a programming language and a graphics system. It is used throughout this book to illustrate how to do data mining. In the following sections of this chapter we introduce the basics of R. Many examples are provided and can be readily tried by yourself to facilitate learning. You will also find many examples on the R-help mailing list at https://stat.ethz.ch/mailman/listinfo/r-help. As an advocate of learning by example, and motivated by the programming paradigm of ``programming by example'' (Cypher, 1993), my intention is that you will be able to replicate the examples from the book, and then fine tune them to suit your own needs.

R is a language and the basic modus operandi is to write sentences expressed in that language. After a while you will want to do more than to issue single, simple, commands (sentences), but to write sentences and paragraphs and full novels in this language! R script files (often with the R file name extension) are the place to write script files. You can then re-run your scripts to transform, at will and automatically, your source data into information and knowledge.

This chapter begins with an overview of some of the key advantages (and disadvantages) of using R and continues with a guide to interacting with R. The recommended interface is through the powerful Emacs editor, augmented with the ESS package, under either GNU/Linux or MS/Windows. This is a personal preference and you may prefer some of the alternatives we discuss.

There are graphical user interface (GUI) tools available for accessing R, but they are not in general as gadget full as those provided by commercial data mining tools. This is both a disadvantage and an advantage! For R this results in a steeper learning curve, but once into R, performing operations over the same or similar datasets becomes very easy using its programming language interface.

Let's start with some of the advantages with using R:

Whilst the advantages might flow from the pen with a great deal of enthusiasm, it is useful to note some of the disadvantages or weaknesses of R, even if they are perhaps transitory!

The remaining sections of this chapter can generally be skipped on a reading through the book, but provide a basic reference guide for using R, and in particular, loading in data and manipulating data, as well as some of its programming capabilities. While chapter 2 deals in detail with creating data in R, we introduce some of the basics here. The most basic needs include creating simple datasets, and being familiar with the basic data types and programming concepts, and how to get help.



Subsections
Copyright © 2004-2005
Brought to you by Togaware.