8/19/2015

My First Tinkering with R Language and RStudio IDE

I had a chance to dabble with the Microsoft Revolution Analytics software for R.

I posted the download and install process here: http://www.bloomconsultingbi.com/2015/08/intro-to-microsoft-revolution-analytics.html

I tried a basic project in RStudio.  Created a small text file from text off the web.  Imported into the environment.  Then proceeded to manipulate the data in a variety of formats.  Viewing the table structure.  Creating variables.  Renaming row headers.  Parsing the data.  Sum, Min, Max.  Applying filters.  Taking sub-sets of data, storing into other variables. etc.

Started with a text file, delimited:


Imported text file, performed calculation on Col1:



Created new variable, with sub-set of data, renamed column headers:


And here you can see the variable names:


There's a slew of packages, with drill down, and documentation:

This is pretty basic stuff.  I got to see the most elemental basics of how to get started, get familiar with importing data, what the syntax looks like, how data appears on the screen, how to save the workspace and such.

My first reaction is, this wasn't as scary as I'd imagined.  R is a computer language, like any other.  It has commands, specific syntax, input commands and output results, data manipulation, etc.

Do I know every statistics command and what it does, no.

The R language reminds me of a combination of tools.  It behaves similar to Microsoft SSIS, in data manipulation.  Yet SSIS is mostly drag and drop visual components with embedded commands, contained in a package, executed and scheduled.  

R reminds me of the Apache PIG language more, in that they both manipulate data files through a custom language.  Although R is designed for Statistical purposes and plotting and interpreting the data.

Although I'm not an expert in either one.

From my understanding, the Data Engineer collects data from a variety of sources including Hadoop.  They widdle the data down into a file, perhaps CSV, hand off to the Data Scientist, who imports the file to a statistical language such as R, and proceeds to analyze the data through statistics.

For me, being a Microsoft developer for 20 years, it makes sense to use Revolution Analytics aquired company as R will be, if not already, embedded in many of the tools I already use like SQL  Server Reporting Services, Integration Services, PowerBI, Machine Learning ML in Azure.

I think traditional reporting from Relational Database, processed through ETL into reports and dashboards will continue, but to be a Data Professional going forward, you're going to have to learn new skills such as R to maintain relevance.

Actually, I took Statistics in 1989 at the University of Florida, and really enjoyed the class.  Standard deviations, means, variances.  I also took Calculus, twice.  First time I dropped the class, second time I taught myself.  So I do have an understanding of math and statistics.  Although rusty.

I was pleasantly surprised how easy it was to get the basics of RStudio in an hour or two.  The building blocks to advance into real statistics.

When I attended SQL Pass in Charlotte a few years ago, I believe Revolution Analytics had a booth set up and I spoke to them, if my memory serves correctly.  Now they are part of Microsoft.  Seems like a good addition to the Data stack.

No comments:

Post a Comment