This blog contains reflections and thoughts on my work as a software engineer

onsdag den 24. april 2013

I've recently been part of quite a complex integration project. Lots of fun building it - not so fun maintaining it. The project is an ticket ordering solution where people would buy and pay for a ticket to an event in system #1 (a.k.a The TOC) and then be redirected to our site (a.k.a The Checkin Site) where the user should subscribe to various events (running, bicycling, kayaking etc). You should not be able to enter the checkin-site without having paid for a ticket in The TOC. One of the non-technical requirements were that a user should really not notice that the two systems operated on different domains so we put quite a lot of effort in data flow and encryption to ensure that data and login information would silently flow from The TOC to The Checkin Site so the shift from The TOC to The Checkin Site would be almost transparent to the user.

A number of painpoints were discovered along the way. This is a list of discoveries which I hope someone (such as myself in a distant future when I've forgotten all about the pains and headaches during the last 6 months.........) could find useful in the future when building integration stuff. Here goes: 

  1. Thou should persist external data as-is 
The solution I implemented parsed a customer's order to a DTO and persisted it alongside other informations about the order (timestamps etc). It proved to be a wrong decision not to persist external data in the format we received it before parsing occurred. We have everything stored relationally and being able to query a customer's data in The TOC using SQL rather than relying on a mix of webservice calls and console applications is - well... You can imagine the outbursts and occasional teeth-grinding...

  2. Thou should agree on required fields
Is a person's last name required in your system in order to create a person? What about Email? Gender? It proved to be a less-than-Apple-like userexperience once a user entered our checkin because our two domains (ticket ordering / payment versus event reservation) had different perceptions on data validation. Had we only sat down and talked for a while about the input fields in the two systems we would probably have discovered that a person's birthdate is essential data on a person in our system (due to age validation and other stuff) but birthdate is less-than-important when ordering and paying for a ticket. Even though the user had entered their personal data in the TOC and those data were transferred to us the user would have to fill in the blanks such as Age and Gender as well once they entered the Checkin site. It would have been a much smoother experience from the user's point of view to be able to enter all information in the same workflow and just be presented with the data they entered later on.

  3. Thou should agree early on end-to-end integration test phases
When planning we didn't take into account that The TOC was still undergoing heavy development even though we agreed on a testing / bugfixing phase one month prior to release. It resulted in numerous testcases (such as "Order two adult and one child tickets, buy additional product X and Y in a quantity of 3, go to checkinsite, subscribe to a running event. Validate your receipt onscreen) which were outdated before they could be submitted to our testers because the guidelines for ordering tickets in The TOC didn't match what was currently running on The TOC's testing environment. It resulted in basically all tests were coming back without the tester had ever made it to the Checkin Site because the tester weren't able to order and pay for a ticket using the guidelines provided... Which resulted in developers sitting on their hands ready to fix bug when no bugs were reported in. We ended up cancelling the setup and took all quantitative tests and gave them to a dedicated ressource sitting next to our development team to ease the barrier of communication between the tester and our development team when flaws in guidelines were discovered. 

  4. Thou should be able to subscribe to events 
Only a few days after we released we found out that quite a few people didn't realize that there was a second step involved - for whatever reason a number of people never clicked the 200x300 pixel orange "Click here to enter the Check-in site" button on the receipt from the TOC. Because TOC notification services don't exist  the checkin system weren't notified when new tickets were ordered so the only option for us a the Checkin Site was to implement a pull-based console application doing a "Give me all customers who have ordered something during the last 24 hours" and check if all customers who had had updated their ticket order during the last 24 hours were known to us. They might never hit the Checkin button or they could have added an upselling products to their initial order (such as breakfast Thursday) after their initial order and subsequent checkin had taken place. Especially updates to existing orders proved to be a challenge until the synchronization thingie started to take those scenarios into account. We scheduled the job to run every 24 hours - we could have had a smaller timespan between job runs but I would much have preferred a message-based solution where we could have subscribed to notifications instead as the primary source of notification. I doubt we would have had a setup without some sort of pull mechanism in order to do a full sync once in a while but it is a tedious and slow way to make data flow your way if you don't get to know anything about your customers unless you repeatedly ask for it. 

  5. Thou should reconcile data early and often 
We encountered problems along the way with The TOC's payment gateway which caused a lot of customer support because user's transactions timed out. This in turn caused a lot of manual handling of customer's data in The TOC's backend which in turn caused that a customer in some cases would be created twice with tickets attached to both customer records in The Ticket System. The way our synchronization worked was to get all orders from a given customer - we didn't (and don't) have anything to merge two customer's data. This in turn caused some customer's synchronizations to fail because a given ticket type (an adult ticket) is required in order to create an event subscription. This in turn causes problems in validating that we actually have all data in our systems because numbers now do not match......... We should early in the process have planned for data being out of sync and implemented patterns for dealing with customer's records not matching what we expected (or agreed on). One of the pitfalls were that the developers at The TOC's company ensured us that no customer could submit an order without ordering an adult ticket. True - but once the problems with their payment provider kicked in customer's data were handled manually in the backend systems which didn't take our special business rules into account.... Voila, data weren't what we expected even though we were all in good faith and worked well together to win the race. At least we should for our part have had a plan for reconciliation of i.e. all adult tickets ordered versus what we had registered in our own backend. It should have been possible from day 1 to merge two customer's data into one order. The morale is regarding to data from external systems: Trust nobody. Expect the unexpected. When (not if) a sync job fails on a given record - can you gracefully recover? Who gets notified and how? Settle early how you handle ongoing support when the unexpected happens and somebody needs to take a dive into the bowels of the system to figure out what is going on. 

 Well... there's probably more to come but I can't think of more "lessons learned" right now. Until next time...

Regards K.