ArticlesBlog

Tech Meets Business | Tarun On Reducing Cloud Spends By 50%

Tech Meets Business | Tarun On Reducing Cloud Spends By 50%


Hi guys. I’m Pooja, I look after Strategic Initiatives at Zapr. Welcome to our first session of our special series ‘Tech meets Business’ One of the most interesting things for a software engineer is that his or her work directly impacts the industry Here at Zapr also, we really want to make this real impact in the media and advertising industry by constantly improving and innovating things. Today we have Tarun Gupta with us. He is a Technical Architect here at Zapr. Let’s talk to Tarun then! Happy to be here Pooja. So, we recently did a major activity of optimizing our cloud spends where I was responsible for reducing it. So can you help us understand, how did this requirement come up in the first place? Ya, sure. We were about to integrate with a major partner and we realized that we actually expected a five fold increase in our user base and we realized that because of this our costs were going to shoot up using our predictive model analysis, we figured out that these kind of costs are not really sustainable and we need to
figure out how these costs are distributed across our systems: which
systems we can actually tell where exactly we can reduce this cost or we
can actually get away with some of the redundant systems. So that’s how we came up with this idea of cost optimization in the first place. While this was the major reason for us, the other reason was as a growing startup Zapr was also looking for expanding our business across different geographies, even international geographies which would have further increased our costs so that’s why we wanted to do this cost optimization exercise so that we have a better understanding of our cost distribution across different systems So Tarun, why is this cloud storage and computation so expensive? And how much does a company like Zapr rely on cloud? So, the cloud is becoming cheaper day by day and the main benefit of using cloud is the
elasticity you get Also, you do away with the cost of having an expert team which manages on-premise infra But going to cloud has its own caveat, so as you grow as a startup, your userbase grows the data that you store grows the amount of data you are churning grows so what it requires is, it requires placing in best practices around operating on infra. you did the impossible by reducing the
cost 50% in just a span of about 2 months So how did you go about achieving
this? Actually before starting with any optimization activity, what we did was that we did a very thorough tagging activity. What it helped us
understand is that where actually the cost was coming from. So we tagged each
and every of our resources with important tags like project, servicing,
team and with that we were able to get a clear picture of where each cost was
coming from, what component was contributing how much. Post that, what we did was we tried to question the existence of every component and then
went out went ahead with understanding why that component is costing so much.
Post that, we put a target in mind that how much this component should actually
cause and then that is how we release catched up a plan for reducing the
overall cost by 50% This must have been a massive exercise. Can you tell us about the different optimizations that you did? Did you have to work across teams also for this? Yes, it was. So major themes were around removing redundant component so wherein we identified like how same data is stored at
two different formats or two pipelines producing the same similar kind of data.
We identified this and removed. The other was like right sizing and auto scaling, wherein we identified that are we using the right kind of resources for
right things or should we like move to a higher memory instance which have low CPU
or where do we require more CPU and less memory. The other part was around auto
scaling. So you do not really need to be running at peak capacity at all the time.
So for instance, at night time you really don’t need all your resources to be up
So we established metrics around that how do we want to automatically scale in
and scale out. Another important thing was life cycles and backup. So as a company, you are always taking backups and you are really not
worried about removing it and with growing scale and with time your data
also grows the data that you’re storing grows so what we did is we identified which type of data we want to store for how much period and how do we
transition this across different storage tiers which has different different
costs. Along with all these we did lot of core optimizations as well. So what we
did was that we rewrote one of our core components for better efficiency giving
almost 10x performance and the other thing was like we analyzed that where
we can compress the data and reduce the data in flight and data X store. Also it required working across with all the engineering teams and some business teams as well. At times it was tough to get everyone at the same page but
what it led to was that during these discussions we did get some very
innovative solutions or answers to the problem that we were trying to solve. That’s really impressive,Tarun. So, tell me, while figuring out where
each penny was spent what was the most unanticipated place you saved the cost? Sure, so I love to talk about this one. See almost two billion fingerprints daily which leads to a lot of influx of data so when we were looking at this component, we saw that we were doing as much “data-out” as the “data-in”. For most of the cloud
vendors, while data-in is free data-out is charged at a significant price and
this was really puzzling to us that because we were not sending too much
data-out and this was an unexpected place where the volumes look very high.
So on digging out further deep we figured out that this is because of the
SSL certificate exchange that is happening and this is must for an HTTPS
which is the norm of the security in today’s world.
So initially it looked unavoidable but we really wanted to solve for this, so
then further digging deep down we figured out that each SSL certificate
has a chain and the longer the chain the bigger the SSL certificate exchange is. So
we figured out how we can reduce this chain and we bought in certificates
which were like having shorter chain and just by changing the certificates itself
we saved a lot of costs there. Thank you for sharing our experiences, Tarun. As we rapidly scale, it’s really important for us to solve for efficiency of our systems and only when technology and business comes together and work together this real impact can be achieved in the industry. Thank you for watching us. We’ll catch you in our next session of the special series Technology Meets Business at Zapr!

Comment here