Welcome to IBM, John

Welcome to IBM, John

After I joined IBM in 1974, my first job was as a Program Support Representative (PSR) in the Field Engineering division of the company. Our division was responsible for finding and fixing customer problems with IBM hardware and software. PSR's had the software responsibility for mainframe operating systems (OS) and major subsystems that ran under the OS. 

I had been trained to know the ins and outs of the OS software and how to correct flaws, failures, and customer misuse. Customers would call our dispatch when they experienced an outage or unexpected result and we would go and work the problem to a satisfactory conclusion. Often the error was easy to find or it was known with a ready fix, but just as often problems could take a major effort to correct.

Our dispatches were prioritize by the severity of the problem, Severity or Sev 1 meant that the system was completely non-usable - though customers often abused this severity to try to get more speedy resolution for less critical, but important problems. Most of the time the errors were severity 3 and many problems were resolved within a week or less. (True Severity 2 & 3s were worked until resolved).

I was assigned to Insurance Company of North America (INA, Inc. since bought out by Travelers Insurance) and since I was an onsite PSR for them, did not usually get dispatched unless the problem was off hours. I also was assigned to GTE Data Services and a small NJ bank whose name I have forgotten. I tried to visit both on a regular basis and they would save minor problems for my visits.

So one day, at my desk at INA, my pager goes off and I call in to discover that the CICS group at GTE has a severity (Sev) 1 problem (this was hard to believe, since CICS was not in production - Sev 3 would have been closer, but the customer is always right, even when they aren't). I let INA know where I was headed (in case they needed me) and drove up to GTE about 20 minutes away. In the parking lot, my pager goes off again and at the guard desk, I called dispatch to discover another Sev 1 at GTE with the VS1 (Virtual Storage One operating system) group. This could really be a Sev 1 since it was their main operating system. Just as I was getting my badge to go in, my pager went off another time. Making a quick call, the dispatcher informed me there was another Sev 1, also at GTE, in the NCP (Network Control Program - another type of Operating System for networks) group, this was also an in production program and would be prioritized near the top.

IBM 3705 Front Panel Asmall

Upon entering the system programmers' area, the VS1 guy tells me that his error is on a test system and can wait on the other two, so I started for the network area. I was intercepted by the IBM SE (systems engineer - a software configuration specialist in the Sales Division) who demanded that I fix the CICS first since our (FE's) fix package caused it and a major checkpoint in the project is imminent. Looking at the documentation, I started to question him and the customer on how the problem had presented itself. I won't go into detail, but the SE had misunderstood what the word cumulative meant and when applying the new fix tape which was an accumulation of all previous fix tapes had advised the customer to forcibly unapply previous fixes and then force (even when they had prerequisites he had removed) on the new fixes. The result was disastrous and there was no way to unravel it without weeks of work. The customer understood immediately that they had to go back to a fresh install and put the fix tape on allowing it to call out and install all prereqs and coreqs. This would probably take a day or two. The SE was mad and told me I hadn't heard the end of it (of course, once he checked with his colleagues, I never heard from him again - or saw him again for that matter - since GTE barred him from the account).

The network guy was busy, so I went to the VS1 guy and discovered that his test program under VS1 was failing, but the type of Abnormal End (ABEND) was actually caused by a different one that the system couldn't interpret. I knew of a trick in VS! to trap this type of problem and he tried it, caught the real error, fixed his bug and everything worked.

Wow, in less than 20 minutes, I had resolved two severity one calls at the account and had one to go. The network problem appeared to be a real one and might take awhile to resolve, so I started to gather data. There were three machines that ran NCP and they were all being connected to intelligent workstations out on the network somewhere. Two systems worked just fine, commands and data were sent out and the receiving machine responded properly. But, the third would not work. I looked at traces of the network traffic and the outbound appeared clean. So, I called software support and within a half hour, had an NCP specialist on the line. In the meantime, I had been digging and started to come to the conclusion that the sending machine was not sending out the code to indicate that what followed was data (this was known as transparency), so maybe the receiving machine thought everything was a command or everything was data. I mentioned this to the specialist and he said, "Wait a minute, I saw something in the hardware tips about a bad card in the processor that causes transparency problems". Sure enough, a known problem, but we don't know if the card in our failing processor is at that version or why the other two don't fail. I asked the customer to switch the connection of a working system to the non-working and vice versa and lo and behold, the problem followed the processor.

The regular CE (Customer Engineer - Hardware specialist) was not there and the guy there was a new-hire on a fast path to management (read very inept at fixing anything we called them "Suits"). He checked and could verify that the cards in the working machines were up to date, but the third machine had a downlevel card. The EC (Engineering Change) number I gave him didn't match up since it had been subsequently updated and it took him a bit of discovery (5 seconds for normal CE) to find new number. Then he drove to Philadelphia to the parts center to get the part.

I felt good as I went back to INA, three Severity Ones solved in less than three hours. Normally, I wouldn't have three in a month.

A week later, at a branch meeting, I expected that this feat would be recognized in some way. Often, cash bonuses were awarded, etc. and this all looked good in the employee jacket. So, when my manager got up I expected at least an attaboy. Instead he gave an IBM Means Service (our highest award) to another PSR (well deserved and not begrudged in the least) and sat down. Next the hardware manager of the account got up. Technically, all service in an account was owned by the hardware manager assigned to it, so he would also be a logical person to give an award for service in his account. He started talking about how this NCP problem had  been a thorn to the customer and how hard work, teamwork and persistence had saved the day. He then awarded the Suit $250 and sat down. The meeting was then ended.

As I was leaving the meeting (very disappointed and disillusioned), the CE for the account (a good friend of mine) whispered to me, "Welcome to the real IBM, John".

Davdan @ 2008-2018