8/30/2015

Empower the Users through Self Service Reporting

In 2007 or so, I worked a job for the County.  We supported many agencies.  In doing so, we wrote and maintained the applications and data.

I was assigned a new project to help out with reports.  During our first meeting, the client mentioned their data was silo'd behind our walls and they never had access to their own data, except for what they downloaded off the internal application.


The conference room had a computer, so during the meeting I logged on, opened Microsoft Access, connected to the development instance of their data stored in Oracle, and proceed to download their entire database in about a minute.


Opened a copy of Crystal Reports, pointed to the Access database, used the Wizard to create a quick report.  Within a minute, we had a report displayed on  the screen from the projector.


The Agency could see how easy it was to view the data and build a report.  They couldn't believe it.  And asked us for access to the database and how they could purchase 5 licenses of Crystal Reports.  And some of my time to get them up to speed on the reporting tool.


It was great to be the one to open their eyes and see their reaction.  A hidden world opened up in just a few minutes.  Empowerment.  Self service.  No longer dependent on IT.


Fast forward to today, just about every organization is aware of the benefits of accessing their data.  Back then, being a one eyed person in the land of the blind made a huge impact.

8/29/2015

From Reporting to Hadoop to Machine Learning to Black Box Algorithms

Back in the day the DBA was the gate keeper to the data.  Very tight access.  They had the keys to the kingdom.

Programmers were granted access to the applications, reluctantly.  First client server, then web and now mobile.


And then there were the report writers or Business Intelligence people.  DBA never liked them much.  Always pulling reports, slowing down the database, causing locks.  Writing crappy code.


Enter Hadoop.  I didn't see many, if any, DBA's making the leap into Hadoop.  Since they don't like writing SQL, and they don't program Java, Python, Scala, Hive or Pig, why would they like Hadoop.


Report Writers - Business Intelligence - Data Warehouse people, are they getting into Hadoop?  Some.  They have an understanding of the data, although Unstructured and Semi-Structures is a bit foreign.  They can mount the data into SQL Tables in Hive and access from a variety of sources, including ODBC, OData feeds, Power BI, etc.  Just turn the data into Relational and they're off and running.  But they don't necessarily program Java, Python or Scala, from what I've seen.


So if the main players in the data space, the DBAs and Report Writers / BI / Data Warehouse don't have a clearly defined entry way into the world of Hadoop, that could explain why there was more hype than real life Hadoop growth.


From what I saw, the people working with Hadoop, were out of college, or start ups or major companies with financial resources to bring in top developers.  Perhaps now, more companies are jumping on the bandwagon.


It used to be, "What is Hadoop?"


Then, "What do I do with Hadoop?"


Now, Hadoop is another tool in the tool chest, for working with data.


The "shiny" newness wore off.


For companies entering the Hadoop space, I suppose they need a real world question to answer, a use case.  Then create a Hadoop sandbox, obtain a developer and admin, and start ingesting data.  Mash it around.  Look for insights.  Use those insights to run the business.


Reporting has always been past tense, numbers explained what happened.


The hot thing now is Machine Learning, algorithms and statistical analysis.  This allows forward thinking, predictive analysis, forecasting.  This space is hot right now.


Looking forward, algorithms will become a commodity.  Just pick one off the shelf, integrate into your app, and you're off and running.


There's already sites out there now to leverage existing black box algorithms.


You can't expect people with 20 years of experience in IT to magically produce a PhD degree, it's impossible.  The train left the station two decades ago.  Instead, bring the complexity down to a level that Data Professionals can work with it.


That's how I see it.  Time will tell.

8/23/2015

Shortage of Data Scientists

in 1995, when I worked as a Crystal Reports developer, there were no peers. Nobody was strictly a report writer.

There weren't many vendors. Almost no literature. Reports were an after thought. Right behind documentation.


Then, I noticed Business Intelligence spring up. Scorecards, KPI, Cubes, Dashboards.


Now, the market is flooded with report writing tools. And lots of developers. And the occupation of mere report writer is limited at best.


Data is now the hot topic. Variety of formats. Ways to manipulate and mash. Delivery mechanisms. Self Service.


The report writer has had their legs knocked out from under them. The underlying technology has grown. And a new generation is stretching the limits of what's possible.


Hadoop. Cloud. Mobile. Unstructured data. Streaming data. Statistics. Algorithms. Neural networks. Deep learning. Artificial Intelligence. Micro services. Internet of things.


The industry has matured overnight. And left the traditional report writers in the dust.


The term Data Scientist appeared out of nowhere. One who knows advanced Math, Statistics, algorithms, programming, domain knowledge, communication skills, visualizations, Hadoop, Spark along with a PhD or advanced degree.


how does a report writer accumulate all the required skills overnight. They don't. The new breed already has an assortment of require skills.


Hiring companies want experienced rock stars out of the gate. There is no established career path allowing report writers to make the leap across the chasm. The number of skills required is quite staggering.few people have PhD degrees, advanced Statistics knowledge and programming skills. It's possible to acquire these skills on your own, but that takes time, and effort. And how does one gain practical experience on the job.


Granted, there are a number of sites allowing people to gain skills such as Kaggle. Still, one does not become expert in Statistics in a few days.


There is a shortage of Data Scientist. I'd day one reason is because one enters the field out of school, not grow from the bottom up.


However, like anything in technology, in order to open up to the masses, they'll need to "dumb" it down a tad so everyday people can contribute as well.  Otherwise, the demand will exceed the supply until the schools can pump out enough qualified people.

 

8/20/2015

Using PowerShell to Pull .ppt Files off the Web

I was perusing the web looking for free online training on Data Mining.

Found a course on the site: http://www.kdnuggets.com/data_mining_course/index.html

The instructions say to prefix the URL and then append the suffix.


So I proceeded to copy-paste the list into Excel:


Copy-Pasted the results to a new tab:


Found a site with example script to pull files using PowerShell: https://blog.jourdant.me/3-ways-to-download-files-with-powershell/

Copied and modified script to the following:



Opened PowerShell ISE as Administrator, changed directory path to the folder containing the script, loaded in PowerShell ISE and ran the following command:

Set-ExecutionPolicy RemoteSigned

Then ran just file #9:


PowerShell Script file:


I'm sure there's a way to loop through all 19 files, and read from Excel, but instead, I just copied and pasted the code 18 times, modified:



Execute Script.ps1:



$url = "http://www.kdnuggets.com/data_mining_course/dm8-decision-tree-cart.ppt"
$output = "$PSScriptRoot\dm8-decision-tree-cart.ppt"
$start_time = Get-Date
Invoke-WebRequest -Uri $url -OutFile $output
Write-Output "Time taken: $((Get-Date).Subtract($start_time).Seconds) second(s)"

$url = "http://www.kdnuggets.com/data_mining_course/dm7-decision-tree-c45.ppt"
$output = "$PSScriptRoot\dm7-decision-tree-c45.ppt"
$start_time = Get-Date
Invoke-WebRequest -Uri $url -OutFile $output
Write-Output "Time taken: $((Get-Date).Subtract($start_time).Seconds) second(s)"

$url = "http://www.kdnuggets.com/data_mining_course/dm6-decision-tree-intro.ppt"
$output = "$PSScriptRoot\dm6-decision-tree-intro.ppt"
$start_time = Get-Date
Invoke-WebRequest -Uri $url -OutFile $output
Write-Output "Time taken: $((Get-Date).Subtract($start_time).Seconds) second(s)"

$url = "http://www.kdnuggets.com/data_mining_course/dm5-classification-basic.ppt"
$output = "$PSScriptRoot\dm5-classification-basic.ppt"
$start_time = Get-Date
Invoke-WebRequest -Uri $url -OutFile $output
Write-Output "Time taken: $((Get-Date).Subtract($start_time).Seconds) second(s)"

$url = "http://www.kdnuggets.com/data_mining_course/dm4-output-representation.ppt"
$output = "$PSScriptRoot\dm4-output-representation.ppt"
$start_time = Get-Date
Invoke-WebRequest -Uri $url -OutFile $output
Write-Output "Time taken: $((Get-Date).Subtract($start_time).Seconds) second(s)"

$url = "http://www.kdnuggets.com/data_mining_course/dm3-input-concepts.ppt"
$output = "$PSScriptRoot\dm3-input-concepts.ppt"
$start_time = Get-Date
Invoke-WebRequest -Uri $url -OutFile $output
Write-Output "Time taken: $((Get-Date).Subtract($start_time).Seconds) second(s)"

$url = "http://www.kdnuggets.com/data_mining_course/dm2-intro-machine-learning-classification.ppt"
$output = "$PSScriptRoot\dm2-intro-machine-learning-classification.ppt"
$start_time = Get-Date
Invoke-WebRequest -Uri $url -OutFile $output
Write-Output "Time taken: $((Get-Date).Subtract($start_time).Seconds) second(s)"

$url = "http://www.kdnuggets.com/data_mining_course/dm1-introduction-ml-data-mining.ppt"
$output = "$PSScriptRoot\dm1-introduction-ml-data-mining.ppt"
$start_time = Get-Date
Invoke-WebRequest -Uri $url -OutFile $output
Write-Output "Time taken: $((Get-Date).Subtract($start_time).Seconds) second(s)"

$url = "http://www.kdnuggets.com/data_mining_course/dm19-data-mining-and-society.ppt"
$output = "$PSScriptRoot\dm19-data-mining-and-society.ppt"
$start_time = Get-Date
Invoke-WebRequest -Uri $url -OutFile $output
Write-Output "Time taken: $((Get-Date).Subtract($start_time).Seconds) second(s)"

$url = "http://www.kdnuggets.com/data_mining_course/dm18-microarray-data-mining.ppt"
$output = "$PSScriptRoot\dm18-microarray-data-mining.ppt"
$start_time = Get-Date
Invoke-WebRequest -Uri $url -OutFile $output
Write-Output "Time taken: $((Get-Date).Subtract($start_time).Seconds) second(s)"

$url = "http://www.kdnuggets.com/data_mining_course/dm17-targeted-marketing-kdd-cup.ppt"
$output = "$PSScriptRoot\dm17-targeted-marketing-kdd-cup.ppt"
$start_time = Get-Date
Invoke-WebRequest -Uri $url -OutFile $output
Write-Output "Time taken: $((Get-Date).Subtract($start_time).Seconds) second(s)"

$url = "http://www.kdnuggets.com/data_mining_course/dm16-summarization-deviation-detection.ppt"
$output = "$PSScriptRoot\dm16-summarization-deviation-detection.ppt"
$start_time = Get-Date
Invoke-WebRequest -Uri $url -OutFile $output
Write-Output "Time taken: $((Get-Date).Subtract($start_time).Seconds) second(s)"

$url = "http://www.kdnuggets.com/data_mining_course/dm15-rules-regression-knn.ppt"
$output = "$PSScriptRoot\dm15-rules-regression-knn.ppt"
$start_time = Get-Date
Invoke-WebRequest -Uri $url -OutFile $output
Write-Output "Time taken: $((Get-Date).Subtract($start_time).Seconds) second(s)"

$url = "http://www.kdnuggets.com/data_mining_course/dm16-rules-regression-knn.ppt"
$output = "$PSScriptRoot\dm16-rules-regression-knn.ppt"
$start_time = Get-Date
Invoke-WebRequest -Uri $url -OutFile $output
Write-Output "Time taken: $((Get-Date).Subtract($start_time).Seconds) second(s)"

$url = "http://www.kdnuggets.com/data_mining_course/dm14-association-rules.ppt"
$output = "$PSScriptRoot\dm14-association-rules.ppt"
$start_time = Get-Date
Invoke-WebRequest -Uri $url -OutFile $output
Write-Output "Time taken: $((Get-Date).Subtract($start_time).Seconds) second(s)"

$url = "http://www.kdnuggets.com/data_mining_course/dm13-clustering.ppt"
$output = "$PSScriptRoot\dm13-clustering.ppt"
$start_time = Get-Date
Invoke-WebRequest -Uri $url -OutFile $output
Write-Output "Time taken: $((Get-Date).Subtract($start_time).Seconds) second(s)"

$url = "http://www.kdnuggets.com/data_mining_course/dm12-data-preparation.ppt"
$output = "$PSScriptRoot\dm12-data-preparation.ppt"
$start_time = Get-Date
Invoke-WebRequest -Uri $url -OutFile $output
Write-Output "Time taken: $((Get-Date).Subtract($start_time).Seconds) second(s)"

$url = "http://www.kdnuggets.com/data_mining_course/dm11-evaluation-lift-cost.ppt"
$output = "$PSScriptRoot\dm11-evaluation-lift-cost.ppt"
$start_time = Get-Date
Invoke-WebRequest -Uri $url -OutFile $output
Write-Output "Time taken: $((Get-Date).Subtract($start_time).Seconds) second(s)"

$url = "http://www.kdnuggets.com/data_mining_course/dm10-evaluation.ppt"
$output = "$PSScriptRoot\dm10-evaluation.ppt"
$start_time = Get-Date
Invoke-WebRequest -Uri $url -OutFile $output
Write-Output "Time taken: $((Get-Date).Subtract($start_time).Seconds) second(s)"

$url = "http://www.kdnuggets.com/data_mining_course/dm9-rules-regression-knn.ppt"
$output = "$PSScriptRoot\dm9-rules-regression-knn.ppt"
$start_time = Get-Date
Invoke-WebRequest -Uri $url -OutFile $output
Write-Output "Time taken: $((Get-Date).Subtract($start_time).Seconds) second(s)" 


Files #5 and #8 weren't found, other than that, all files downloaded fine:


That's a basic example of how to pull files off the web using PowerShell ISE.  PowerShell is completely flexible and there's so much you can do with it.  The one thing I remember from watching some online courses, use the "help files".