One time I was working for a credit card processing company. We were upgrading a product I supported called Vantive. I worked closely with the Oracle DBA. He got stuck on the upgrade and it simply would not complete. My boss asked me to investigate. He said if we have to fly a resource in from the Vendor, it will cost a lot of money for food, lodging, travel plus billable hours. This was a great concern.
So I began to investigate. I looked through the error logs. Nothing stood out. I spoke with the DBA asked where it stopped working. After some grunt work, I concluded that one of the custom tables we had was the culprit. I asked the DBA to recreate the table with Varchar instead of Char fields, which he was not thrilled about, but did it, then reran the upgrade again and presto, it worked. That was a good troubleshooting experience.
At the same company, we were deploying a product in Utah for the new help desk, which I supported. We tested and worked with the IT resource on their side. The day of deployment, I gave explicit instructions. However, that day they complained the system was too slow. Again, my boss asked me to investigate. So I got back to my computer, I couldn't find anything wrong. I went back to the boss and said "I think they are pointing to Test, not Production." He said, "Go shut down the Test server". Which I did. Within 5 minutes, we got a call from our friend in Utah, said the system just stopped working. So we had him correct his TNSNames file to point to Production. Problem solved.
Both of those were kind of black box scenarios, which are difficult to troubleshoot.
When I got hired as Senior Java Programmer, I had actually never written a line of code in Java. My first day on the job, I was assigned the task of getting the CAPTCHA piece working on the public website. Their best Java guy couldn't figure it out. I investigated the code, spoke with the Architecture guy, turns out the Production servers were more complicated than they thought, as there were two internal and two external servers where the user got bounced around, adding cookies to each of the servers. I added some code to handle this and problem solved.
In that case, I had to speak with an expert on the infrastructure to figure out a solution.
Another time, I was working for an insurance company, and one of the developers in .net had placed some code to encrypt the passwords one direction, meaning you couldn't reverse the encryption to see the actual value. I was looking at the code one day and found that the password was plain text when it got to the server, so I added a function to store off the passwords to a table in raw format before it got encrypted. For some reason the production server was wide open to us developers so we could do what we wanted and I must have left that code in there so as each user logged in, I capture their credentials to a table. Then one day the owner came to me saying they were in a pickle because they needed all the passwords. I said, "oh, I happen to have every single password." I created a quick data dump and the boss was pleased. And then I removed the function and deleted the password table and all was well.
At the same company, I was hired as a Crystal Report Developer. My job was to recreate the report from Visual Basic 6. Looking at the code, there was an outer query, and an inner query, turns out the developer did not know what joins were. I found a bug in the code. I went to my manager and reported the bug. He said, "That's impossible, the entire company numbers are based on those query's, if there was a problem I'd know about it (he wrote it)." I said, "okay" and went back to my desk. A few days later he called me into his office, said to explain the bug I found. Turns out it was a bug, which I fixed, and then he asked me to create a data warehouse storing all the data, using those numbers are the foundation. He then left to start his own company and I was the sole programmer left. The day of his departure, the entire Raid server went down (cough, cough). So the two other owners asked me if I could restore all 50,000 PDFs that got wiped out. Having never seen the app before, it took a day to figure it out, then I kicked off the job which ran for 3 days, restoring just about every PDF and the website was restored. The Owner said I had a job for life~!
Another time I was in a meeting with an internal client. We ran their application on our servers and stored their data in Oracle. The user never had access to their own data. In the meeting when I learned of this, I opened an Access database, connected to their server, and downloaded their entire database to Access in about 2 minutes, where the users got to see their data for the very first time. I then created some quick reports in Crystal Reports on the fly, the client's mouth had dropped and requested immediate access to their data and 5 licenses to Crystal, which I mentored them to create their own reports.
Those are the highlights I can recall at the moment.
There were some downsides as well. While working for the Credit Card Processing company, I converted an Access database to Oracle. The application was pre-Instant Messanger, so each of the 100+ clients would ping the Access database every minute. So when it went live in production pointing to the Oracle Database, it took down the Unix server. I had to meet with the boss and the users as all fingers were pointing at me. In the end, I removed the IM feature from the app and the user purchased a true IM application for the help desk.
Another time, I ran a query which I found on the internet, which read every field in every table of the database. It ran for 16 hours, then I tried to kill it, and nothing happened. Turns out it caused the data warehouse to not load, and the DBAs spend a few days tracking it down to my thread. Yikes.
Another time I added some code to my Java JSP public access database, which users were getting other users sessions, so we had to back out the code changes immediately. We spend a few weeks troubleshooting the issues, I happen to find a link on Google from some University out west that reported the same issue, which provided a solution. I added, went to QA and finally production and the user were happy. That was a WebSphere java issue for serializing session state.
At the same company, we deployed a PCI solution which re-directed users to an online credit card payment system, which did a round trip back to our servers, where we updated the mainframe and database to confirm payment. Except some users closed the browser premature, so we didn't record payment, so we sometimes shut off a persons water at their house due to lack of payment, when in fact they paid, sometimes more than once. Yikes.
So sometimes we win, sometimes we don't. In the end we learn from our mistakes and try not to repeat. As a programmer, we straddle the line every day sometimes without a security net. And some days we are the hero.