1/29/2016

First Microsoft CNTK Project in Visual Studio on Windows

The other day, I posted about the new open source technology from Micrsoft CNTK.

There are now many open source technologies to handle Machine Learning, Deep Learning, Artificial Intelligence, Neural Networks, and many flavors of each.

But this one allows the ability to run on Microsoft operating systems, one feature that I especially enjoy.

Here's a URL to get started with an example.

So let's get started.



Here we are in Visual Studio 2013, in the project we created the other day:



What is all this stuff?  Well, the project was written in c++, which is portable to Windows or Linux or Unix.

Here's the folder containing the raw files that comprise the project:



Unfortunately, there isn't much documentation on getting started as the technology was just released on GitHub the other day.  So I do what I traditionally do when learning any technology, poke around to see what's there.

First thing, there's a folder that contains some PDF files:



Interesting... there's an example folder:





Perhaps some clues...



With a "Readme" file:




"This example demonstrates usage of NDL to train a neural network on CIFAR-10 dataset (http://www.cs.toronto.edu/~kriz/cifar.html).
CIFAR-10 dataset is not included in CNTK distribution but can be easily downloaded and converted by running the following command from this folder"
Says there's a data set on the web that can be downloaded, running a Python script:  python CIFAR_convert.py


So we open a command prompt, navigate to the desired directory, do a search for the python file "CIFAR_Convert.py":


 
Thar she blows... except the command did not run.  Because  there's no "Python" entry in the Environment Variables: so let's add it:


Added entry to Environmental Variables:



Next error, we need to install "Numpy":



After reading the documentation, there's an alternate way, by loading different version of Python Anaconda:





 Installing bits, 64:



One note, every time the Environment Variables are modified, it requires a reboot.

It still does not recognize the Numpy.  Researching, it turns out the versions needs to line up, Python and Numpy, in my case 2.7.3 http://stackoverflow.com/questions/11200137/installing-numpy-on-64bit-windows-7-with-python-2-7-3



However, this loaded the file into the Anaconda application, we wanted the c:\Python26\Scripts folder.

So re-downloaded the correct numpy file as instructed in the Stackoverflow article above:

http://www.lfd.uci.edu/~gohlke/pythonlibs/

Downloaded file, copied to the Scripts directory of the Python27 folder on the c: drive:



Installed successfully!



So running the application in the IDE, bypassing Debug mode, we get the following:



This indicates "no command line arguments found".  So we need to run the app from the command line apparently, and pass in parameters.

The CNTK.exe file resides here: C:\Users\jbloom\Desktop\BloomConsulting\MicrosoftCNTK\CNTK-master\x64\Debug

Here's the command to execute the job:

configFile=01_Conv.config configName=01_Conv


Holy Toledo, looks like it ran, no errors... interrogating further...

It created a log file in the Output director:



 Looking at the log file, it threw an error midway through:

PROCESSED CONFIG WITH ALL VARIABLES RESOLVED
command: Train Test
precision = float
CNTKModelPath: ./Output/Models/01_Convolution
CNTKCommandTrainInfo: Train : 30
CNTKCommandTrainInfo: CNTKNoMoreCommands_Total : 30
CNTKCommandTrainBegin: Train
NDLBuilder Using CPU
Reading UCI file ./Train.txt
About to throw exception 'UCIParser::ParseInit - error opening file'

[CALL STACK]
    >Microsoft::MSR::CNTK::ThrowFormatted
    -Microsoft::MSR::CNTK::RuntimeError<>
    -UCIParser,std::allocator > >::ParseInit
    -Microsoft::MSR::CNTK::UCIFastReader::InitFromConfig
    -Microsoft::MSR::CNTK::UCIFastReader::Init
    -Microsoft::MSR::CNTK::DataReader::DataReader
    -std::_Ref_count_obj >::_Ref_count_obj >
    -std::make_shared,Microsoft::MSR::CNTK::ConfigParameters & __ptr64>
    -CreateObject >
    -DoTrain
    -DoCommands
    -wmainOldCNTKConfig
    -wmain1
    -wmain
    -__tmainCRTStartup
    -wmainCRTStartup
    -BaseThreadInitThunk
    -RtlUserThreadStart

EXCEPTION occurred: UCIParser::ParseInit - error opening file
-------------------------------------------------------------------
Usage: cntk configFile=yourConfigFile
For detailed information please consult the CNTK book
"An Introduction to Computational Networks and the Computational Network Toolkit"
-------------------------------------------------------------------


Adding command line arguments to the Visual Studio Project:


"From the output of the above command you simply copy the 'VS debugging command args' to the command arguments of the CNTK project in Visual Studio (Right click on CNTK project -> Properties -> Configuration Properties -> Debugging -> Command Arguments). Start debugging the CNTK project."



needs to know the actual path of the file, so modified command line argument slightly:

configFile=C:\Users\jbloom\Desktop\BloomConsulting\MicrosoftCNTK\CNTK-master\Examples\Image\Miscellaneous\CIFAR-10\01_Conv.config configName=01_Conv
 
Running Visual Studio project in debug mode...it lets you step through the C++ code, memory allocation, low level stuff...

Turns out, when you run through VS, it places the Output file in a different location:

C:\Users\jbloom\Desktop\BloomConsulting\MicrosoftCNTK\CNTK-master\Source\CNTK\Output


PROCESSED CONFIG WITH ALL VARIABLES RESOLVED
command: Train Test
precision = float
CNTKModelPath: ./Output/Models/01_Convolution
CNTKCommandTrainInfo: Train : 30
CNTKCommandTrainInfo: CNTKNoMoreCommands_Total : 30
CNTKCommandTrainBegin: Train
About to throw exception 'error opening file './Macros.ndl': No such file or directory'

[CALL STACK]
    >Microsoft::MSR::CNTK::ThrowFormatted
    -Microsoft::MSR::CNTK::RuntimeError
    -fopenOrDie
    -::operator()
    -Microsoft::MSR::CNTK::attempt< >
    -Microsoft::MSR::CNTK::attempt< >
    -Microsoft::MSR::CNTK::File::Init
    -Microsoft::MSR::CNTK::File::File
    -Microsoft::MSR::CNTK::ConfigParser::ReadConfigFile
    -Microsoft::MSR::CNTK::ConfigParser::LoadConfigFileAndResolveVariables
    -Microsoft::MSR::CNTK::NDLBuilder::Init
    -Microsoft::MSR::CNTK::NDLBuilder::NDLBuilder
    -std::_Ref_count_obj >::_Ref_count_obj >
    -std::make_shared,Microsoft::MSR::CNTK::ConfigParameters const & __ptr64>
    -DoTrain
    -DoCommands
    -wmainOldCNTKConfig
    -wmain1
    -wmain

attempt: error opening file './Macros.ndl': No such file or directory, retrying 2-th time out of 5...


turns out, we need to set the output directory:



Well, if you remember, earlier in the post, we got stuck at the Pip, to load the Numpa files, because it loaded them into the Anaconda Python folders.  Uninstalled the Anaconda application.

Ran file get-pip.py from command line again:



 Okay, now that Numpa got installed, we go back to this URL to get the correct version of files (Python and Numpy versions align): http://www.lfd.uci.edu/~gohlke/pythonlibs/#numpy





Alright!  Reran the Python script to download the file from the web, train the model, except it threw an error for Out of Memory.  

Turns out Python will eat all your available memory, then kaput!

Well, at this point, I changed to another example project.  Was able to extract the files for another experiment, in the same Image folder:



Done!



It created the necessary files:



It actually pulled the files from here.

 So to continue from this URL:

Run the example from the Image/MNIST/Data folder using:

CNTK.exe configFile=../Config/01_OneHidden.config

Now getting access violation...



Well, stepping through the code, not sure if this project is supposed to work, highlighted c++ code in yellow:



int wmain(int argc, wchar_t* argv[]) // wmain wrapper that reports Win32 exceptions
{
    set_terminate(terminate_this);   // insert a termination handler to ensure stderr gets flushed before actually terminating
    _set_error_mode(_OUT_TO_STDERR); // make sure there are no CRT prompts when CNTK is executing

    // Note: this does not seem to work--processes with this seem to just hang instead of terminating
    __try
    {
        return wmain1(argc, argv);
    }
    __except (1 /*EXCEPTION_EXECUTE_HANDLER, see excpt.h--not using constant to avoid Windows header in here*/)
    {
        fprintf(stderr, "CNTK: Win32 exception caught (such an access violation or a stack overflow)\n"); // TODO: separate out these two into a separate message
        fflush(stderr);
        exit(EXIT_FAILURE);
    }
}
#endif

Moving to the 3rd example project, "Speech", we set the command line arguments,

"configFile=C:\Users\jbloom\Desktop\BloomConsulting\MicrosoftCNTK\CNTK-master\Examples\Speech\AN4\Config\FeedForward.config"

 or from the CMD (as administrator)

### Run

Run the example from the Speech/Data folder using:

`cntk configFile=../Config/FeedForward.config`


It throws up a warning message:



Check the boxes, let it run...





Well, another access violation or stack overflow.


That's as far as I'm going to take it for now.  From downloading the project, getting it to compile in Visual Studio, then downloading the required files for Python and its dependencies, and getting the versions to line up, it appears my old laptop does not have the memory to handle this code at this point in time.



Just realized there's a page that lists existing bugs for this project: 
https://github.com/Microsoft/CNTK/pulse 

So thanks for following along at home.  It's important to stay current with technology.  This just happens to be some rather complex and advanced stuff.  A nice feature, it works on Windows and seems like it has lots of potential going forward!

Again, you can read the first blog post on this subject: http://www.bloomconsultingbi.com/2016/01/first-try-at-microsoft-cntk.html