Fenils Blog

Upgrading to neovim 0.12

Wed, 20 May 2026 00:00:00 GMT

I recently upgraded my neovim to 0.12.2, I have been chipping away at it slowly using NVIM_APPNAME feature so that my day to day activities are not hindered. One of the things I found interesting in using NVIM_APPNAME is, I saw a blog where author manually create all the ~/.local/share/nvim-next etc dirs, I did the same cause I was not aware, but later I realized that one can literally just say NVIM_APPNAME=nvim-next ~/.local/share/bob/0.12/bin/nvim and neovim will automatically make all those dirs for you. Also yes, I use awesome bob-nvim for managing neovim versions. With that out of the way, lets get started!

Major changes

Cool, lets talk about major changes, listing them out first:

vim.pack: Migrating to builtin package manager
lsp: Ditching nvim-lspconfig etc for vim.lsp.enable
ui2
no vimscript in config

Migrating to builtin package manager

Neovim now comes with an inbuilt package manager called vim.pack, its currently experimental but is considered good enough for daily driving. I have used a bunch of package managers over time, vim-plugin, packer, lazy and now vim.pack. While I don't care about them a lot, each migration has been inspired by reasons. vim-plug -> packer, lua shift of whole ecosystem. packer -> lazy.nvim, extra features like dependencies, clean config, etc and finally lazy -> vim.pack cause I want to reduce count of external dependencies. With all the changes happening upstream, I am really hopeful that some day my whole config will just fit in a small file!

But I couldn't simply move to using vim.pack, I had to evaluate if it was strictly an upgrade. Factors I looked for:

Do I depend on dependencies feature of lazy?
Does it slow down my startup times?
Does separating config in vim.pack make config structure worse?

For dependencies section, I went through my plugin list and realized I had these dependencies:

aerial.nvim on nvim-treesitter and nvim-web-devicons
telescope on telescope-fzy-native, telescope-live-grep-args and plenary.nvim
nvim-treesitter-context on nvim-treesitter

These don't seem that bad, all of them are loaded much later in the neovim startup process and I could just keep them in a particular order to make the resolution pass. Well I tried it and that did work out, so ticked this off.

For my startup times, I used nvim --startuptime startuptime.log .. If you are not familiar with this command, it instructs neovim to write a log of its startup activities and to get startup time you would look for log --- NVIM STARTED ---, first number in that row is the amount of seconds it took to startup your neovim. I measured this and realized I hadn't used "lazy" in lazy.nvim package manager 😭. But I never felt the need to make it go faster, cause I wasn't able to perceive a delay when starting it. Day I start noticing the delay is the day I bring down my hammer. I had done this earlier for my shell too. But after the migration, timings seemed the same, actually a bit better than lazy.nvim, so I was already happy :)

For config structure, I really liked how lazy forced a dir structure of plugins and encouraged keeping config along side plugin installation line itself. It was a clean way. With vim.pack I had two ways:

keep installation line with config in top level plugin/ dir
keep all installation lines together and config separately

I wanted to maintain order remember? That can easily be done with a single vim.pack.add and listing down all the plugins together with their install order. But with plugin/ dir, I would have to name files 0_, 1_ etc. This is because files in plugin/ are loaded automatically by vim and neovim in sorted order. Naming files like that was a turn off for me, so I went with single vim.pack.add call and created a new dir called plugins/ in lua/ dir and shoved all plugin related config there. I enforced setup order in lua/plugins/init.lua.

With all of this out of the way, migration was really SMOOTH! And I kinda love that I am not pulling a heavy dependency like lazy.nvim in my dep tree :)

If you have advanced usecases or want to understand the feature better, there are two awesome guides: official manual and then https://echasnovski.com/blog/2026-03-13-a-guide-to-vim-pack.html. Read it end-to-end, each section has something you can take away. It is the most comprehensive guide out there right now.

One small trick I learned from the article above is placing vim.loader.enable speeds up startup times for free and this is blessed on us by Folke himself!! Ofc I added that and instantly realized 25ms off the loading time xD

LSP

Neovim v0.12 also brings in more ergonomic LSP usage support. Now, you can place LSP server setting in lsp/ dir and just call vim.lsp.enable(<file-name-in-lsp-dir>)' and this is all the setup you need! nvim-lspconfig is now reduced to just maintain settings for upstream LSP servers. As these rarely change, I just copied from upstream and placed in my lsp/ dir, I also realized I only use two of them now: rust_analyzer and taplo (toml LSP server, also for rust dev :P). With this, I was able to cut down a bunch of lines in my LSP config and also trim down deps of nvim-lspconfig and mason-lspconfig.

ui2

This is an experimental feature where command line meets messages meets pager meets dialog windows. Honestly its best explained in official docs. This was an interesting change which allowed me to trim down a dep I really liked: fidget.nvim. It shows LSP progress on the bottom right corner. Now I do it in ui2 itself using this autocommand, that's it, 15 lines are all we need. I get the progress in same place as command line without Hit Enter prompts.

This works best with cmdheight = 0 , which prevents Hit Enter prompts. cmdheight feature was merged in 0.11 itself, I had tried it then but it felt incomplete and weird, but now with ui2 it has the perfect UX, they have really nailed this! There's just one small hiccup, somehow I am not able to see marco record messages, I could reproduce it on master with minimal config so its definitely an upstream issue, hopefully that gets fixed, but till then I have two hacky autocmds which almost do the same job just slightly worse 🙃.

No vimscript in config

I finally took the leap and ditched out all the vimscript from my config, its completely lua based now! I know I am very late to the party, but I really wanted to keep it around so that I can use it on VMs where vim is the default. Well what finally prompted this move was making a minimal vimscript based vim config, which I can easily drop anywhere and get productive with vim! Here's the minimal config for the curious. I also have a minimal tmux config in the same lines here.

Bugs discovered

This is an interesting section cause I usually never come across any hiccups when upgrading, neovim is a super polished, heavily tested software. People are out there doing builds super frequently to test latest and greatest! ( I was one of them till few years ago :P )

But this release I came across two interesting bugs!

macro recording message display with ui2 and cmdheight=0
a memory segfault with ui2 + invalid rtp and syntax on!

First one we have already discussed, I hope it gets fixed in an upcoming version or even next release is fine :P

For the second one, this is something severe and I was very astonished to come across it! First, link to bug report: https://github.com/neovim/neovim/issues/39815

minimal repro is just:

require('vim._core.ui2').enable()
vim.cmd [[
set rtp+=$LMAO " Cannot be a non-existing dir, needs to be an env var which does not exist
set syntax
]]

So conditions are:

ui2 should be enabled
rtp should be set to an env var which does not exist
syntax should be on

The reason I came across this is because of this weird rtp setting I had set in my config:

set rtp+=$GOPATH/src/golang.org/x/lint/misc/vim

I have no recollection why I had added this, I have messed around a lot with my config over the years, so there are artifacts I still find in weird corners. But main point being, I don't have golang installed in my system, so GOPATH is not set and hence satisfies condition we mentioned above. I am not exactly sure what is causing this crash in neovim internally, it needs some investigation 🧐.

But yeah interesting times! I created a new APPNAME with nvim-debug and managed to track it down to this config and also reproduce on latest master. I have reported it, let's see if someone upstream picks it up before I get my hands dirty 🏃.

Sides

I was checking :checkhealth vim.lsp and realized I hadn't seen :checkhealth in a while, I did that and boom, it was so much cleaner!! Now we have ✅ and ❌ and ⚠️ to show overall health of the features etc, and in general it looked really really clean!

There are also other features like :restart, etc which I didn't delve into much cause I wasn't sure how to make use of those features right now. There are bunch of other features too, do go through all the release notes. Amount of things I have realized by reading the release notes in detail is mind blowing, 100% recommended!

Conclusion

Overall, I am super happy with this new release, only thing I couldn't change this time is colorscheme, I didn't have any lined up to try out 😓.

Except that, I am already looking forward to more amazing things in upcoming releases (multi-cursor looking at you). Till then, chao!

Working with LLMs

Tue, 05 May 2026 00:00:00 GMT

As we usher in this new era of LLMs, it is interesting to see how different people are starting to work with them. And as a typical keyboard thudding monkey, I want to optimize my workflow too. Because a true master understands tools at it his/her disposal the best.

The way I currently work with them is straight forward way popularized by Claude Code, plan with it first in plan mode, then jump into implementation. I try to manually approve everything, but still I lose context in the "hit enter" hell. To over come it, I sometimes just let it make all the changes and then go back and start editing it. Now, ideally this should work, you plan meticulously and once the plan is solid, bang on, all code will be perfect. Right? Right?

I think that's a wrong model to think how software engineers work. Most of the times, we discover/realize things on the fly, and that could be as small as a super small limited scope change to a complete re-design. So it is more of an iterative loop rather than a one shot model. In that case, one would go in and out of plan mode refining the spec as they learn more.

My questions

But before trying to refine our process, lets try to come up with points I want answers to:

How do I know I have explored all the possible ways to attack a problem, could there a simpler solution?
Another is, breaking down abstractions at correct boundaries, I think LLMs struggle with this right now. I see a lot of people dumping code in places where it shouldn't belong in the first place. Why is it dumped there, cause no one cared enough to think about boundaries. Well, this was a problem before LLMs too, but its much more worse right now.
Writing code by hand is a process which forces one to slow down and look at the surrounding code, think about frictions we face when coming up with code. Just being lazy and realizing a lot of things. LLMs don't have that (1). How to bring back this process of slowing down? And in what form? Hand-write everything again?
How do I trust the tests written by LLM? Amount of people who are not reading generated tests is baffling high. No one, literally no one I know is reading generated tests. They think if there are tests its enough. Amount of times I have found generated tests to not be helpful is actually very high. Like the saying of man goes: "To know a man, check his trash". "To know about an implementation, check its tests".
How to find subtle problems within the implementation, Antirez put it nicely: "but still things that superficially work do not mean they are optimal."(2)

User workflows

Before we try to answer these questions, lets try to read Antirez's use of LLMs for array type support in redis (2) (3). Summarizing it the way I understood it:

He wrote first design draft completely by himself
Brought in LLM, started attacking draft from different angles, this would have likely required him asking correct questions to LLM
He read whole code line by line with extreme care. I liked this a lot: but still things that superficially work do not mean they are optimal.
He rewrote the whole implementation again in a mix of manual and LLM mode
Extensive testing, a complete month dedicated to just that

In his own words towards the end:

For high quality system programming tasks you have to still be fully involved, but I ventured to a level of complexity that I would have otherwise skipped. AI provided the safety net for two things: certain massive tasks that are very tiring (like the 32 bit support that was added and tested later), and at the same time the virtual work force required to make sure there are no obvious bugs in complicated algorithms. To write the initial huge specification was the key to the successive work, as it was the key to review each single line of sparsearray.c and t_array.c and modifying everything was not a good fit.

As we are at it, these are some ways I have seen people around me use it:

Clowns: Absolute direct vibe code, this is just dumb
GreatPretenders: Give the problem to LLM, act like they understand it by saying: "we manually accepted edits", test it on basic cases and ship to production.
Meticulously try to plan things with it, try to attack from different angles. From here on two more routes:
- Strategist: Write code using a LLM assisted autocomplete
- OldieGoldie: Write code completely by hand

We are not going to talk about Clowns and GreatPretenders at all except one statement to these people, PLEASE stop making my life difficult.

Thoughts on my questions

Now that we have everyone's workflows in place, let's try to come back to our questions. (Answers in the same bullet point number as the question)

I like Antirez's approach here, he took a month just to write the spec, and he didn't write first draft with the help of LLM, it was completely by himself. This is where I think Strategist and OldieGoldie's get defeated, I believe key point is: not reading the approach given by LLM first. Cause there are times, they just don't know, and they don't know what they don't know. They are not able to come up with few of the strategies you might come up with. You could call this the creative step or whatever. I have noticed, reading LLMs output first creates a bias in the mind, and also we might get hindsighted on asking the correct questions. That's why try to come up with a plan on your own and then work with LLM to try to attack it from different sides to solidify it.
On this part, I think there are two steps where this comes up, first is when planning i.e. 1st step and next is when actually writing code by hand and noticing a friction point. First part is addressable during first step itself, this is usually the easy part. But when it comes to the latter, I think it correlates with 3rd point of mine.
Now this is a tricky one, one needs to slow down, we slowed down once in the initial planning phase, but when next? In the iterative cycle I mentioned above, how do we slow down during the actual implementation section to notice these frictions? Well one way is converting into OldieGoldie, it is slow but definitely works! Though one could be lost completely in implementation details and want to complete it fast, which would lead us to the pre-LLM era problem of people writing absolute horrendous code without respecting any abstractions.

So, completely automated is bad, completely hand written is dicey, then Strategist wins? Well, I don't think so, again this is a point about slowing down, fancy autocompletes are not a great way to slow down and understand cross module dependencies. Well then what? I like Antirez's way here, seemingly he generated all code first as a PoC, realized few things during PoC to fix, re-generated it, assumed just reading everything in extreme detail would help but he didn't know the answer to: but still things that superficially work do not mean they are optimal. So he went back and rewrote whole implementation in a mix of manual and AI-assisted mode.

The difference between Strategist and this is using code as a throw-away signal, Antirez used the first version as a PoC, that's it. He then, rewrote the implementation in his own way completely, this makes the process so much faster and more context aware than one shotting the implementation and making abstractions etc on the fly.
For this point, Antirez said two things:
- "Everything was working, and this type has massive testing, thanks, again to AI"
- "When this stage was done, I started, during the third month, to stress test the implementation in many different ways."
I don't think there's info on what he did here. As such testing is a very subjective topic and how to do it properly for a particular system is a monster of its own. For now, I try to follow the same procedure as before, try to come up with test cases myself and then involve LLMs to expand upon them on their own and combine to form a better list. This helps avoid a bunch of test cases which add 1K lines of abstractions on their own to test a simple thing.
For this one, I think 3rd point above goes in enough details about everything. Key to this point I believe is slowing down and reading code multiple times to try and think from different angles.

This kinda also lays out how I want to try using LLMs going forward.

Unanswered Questions in Antirezs' article

Taking a small detour and going back to Antirez's article, I have a few things I would have liked to understand in more details:

When he used in LLM in the planning phase, what part of it was him trying to probe questions out of LLM as an experienced user and what part of it was, LLM finding defects/improvements on its own?
What part of codebases did he rewrite manually and what parts were rewritten using LLM? How did he decide which part to allocate to who?
How did he approach testing in general, did he check LLMs generated tests in super detail? Did he rewrite them too? What did his one month of testing look like in detail? How did LLMs help outside unit tests?

Conclusion

It was interesting to think about these things while writing this article down. I would have not imagined myself thinking about these things because I had previously been haunted by a college senior of mine being too strict on writing down huge number of pages of LLD, HLD, PRD, etc etc for club projects. Ofc we never finished the projects which he was supervising. I still don't know if all of this was coherent or just random rambling. Well, there's one thing for sure, I have something new to try and I will make sure I keep the rigor up in LLM age! [4]

Footnotes:

(1) https://bcantrill.dtrace.org/2026/04/12/the-peril-of-laziness-lost/
(2) https://antirez.com/news/164
(3) https://github.com/redis/redis/pull/15162
(4) https://oxide-and-friends.transistor.fm/episodes/engineering-rigor-in-the-llm-age

ClaudeHeads

Thu, 09 Apr 2026 00:00:00 GMT

So firstly what are ClaudeHeads? They are people who have claude in place of their head. They literally think LLM is the only answer and can only think using them, they are just straight up bad at individual thinking, but that doesn't matter, cause LLM can solve everything given right context, given all the information in the world about thing which exists.

For me this is a problem. I joined database industry cause I could not bear writing HTTP APIs for the rest of my life. I am not smart enough personally, but there are horrible software engineers out there, you would find shitty code in all parts of the software stack. But for something which is performance critical, needs to correct always, and is always the black box for programmers, that would need highest and purest levels of programmers, right right??? Well it seems I could only enjoy this dream for sometime, cause LLMs have given birth to ClaudeHeads. We use an open source project named datafusion and have based our database on it. Its not a direct stock integration, we have had to make a lot of changes according to our needs, and it seems distribution is still an unsolved problem there. Also main IP of planner stays with us, single execution is not a solved problem, but open source projects are very very good!

Well, given that it is open source, of course LLMs are trained on it. Now, that is one part of the equation, in recent months they have also become good enough to interact with private parts of our codebase. Migration to a datafusion based engine is a recent enough project and we had been working hard to get performance on TPCDS-like benchmark[1], TPCH-like benchmark, Clickbench, and a bunch of internal benchmarks. We were very very slow as compared to our good old internal custom developed Java engine, as compared to Databricks, as compared to Snowflake. Whole team was heads down working on getting perf better than all of the above combined. As me and other senior colleagues took an "old school" methodological approach of looking at heap profiles, CPU flamegraphs, custom metrics collected by us, finding gaps in our understanding of the system, as not whole codebase was familiar to us yet. Here enters my ClaudeHead hero, who downloads research papers of Datafusion/Arrow etc, keeps them in a folder, keeps all TPCDS queries, their flamegraphs, heap profiles and metrics together, and send the agents to "find perf improvements". The result was pretty shit with earlier models, but what about recent ones? You leave them for a night and they conjure up a bunch of things. Tho how do you test them?

So, incidentally someone in my company developed a easy to use benchmark setup. What was left now, multiple branches started getting created, purely vibe coded and benchmarked in parallel. What ever improved perf was posted as it is after "understanding from Claude summary" to the channel and merged to main. Well the problem is "understanding" part is absent, if asked to reason about the change from different angles, like architectural correctness, my friend would turn around and just ask Claude. There's no head working there, it's just Claude. Well how do I know this? Cause I have asked questions around why some part of it didn't make sense in larger scheme of things. Why even tho metric shows there's nothing to optimize, you keep repeating there's an optimization in a specific region, without backed by proof, just because Claude said that. What's worse, this is a junior engineer just entering the field. Not good at coding, not good at databases, not good at CS fundamentals. But given LLMs, he can keep on posting perf optimizations and get them merged. One could argue, if those don't make sense why can't you prove them wrong? Well here comes the main point of article, my views on ClaudeHeads and how they are correct at times, but expert bullshitters at times. Back in the days, when no LLMs existed, if someone bullshitted, they had to put a LOT OF EFFORT to even get something remotely good out, btw this is considering that it was still considered easy, Brandolini's law. During the process, they learned 100s of things and would definitely come out as a much better engineer, but now? Tell LLM to fire off and conjure stuff, what if that does not make sense in grand scheme of things and would literally break in just a different environment (someone shares my feelings). Well that's benchmaxxing. But what if you could keep benchmaxxing again and again for each dataset. That's not exactly what's happening, but I am thinking through scenarios.

Not understanding what changes you are making to me is the biggest risk of all time, and it breaks what I thought about before starting to work on databases. It's not a race of understanding, now it's a race of trying random folder structures with random bits of information to get the best output out of LLM models. Oh guess what, I am still stuck in the old model, and this has caused a big disadvantage to me. Not only am I slow now, I am also losing learning opportunities myself, just because someone decided to not understand them and rely completely on LLMs. I can pick up LLMs to speed up my work, but all I have ever learnt is, slow and steady wins the race. I believe it to my heart, mental models are the biggest factors of a product. That's the reason when someone who understands codebases deeply leaves, new team gets in frenzy. That's the reason losing a product person who understood product deeply, is such a big loss. They are hard to replace. Code was never the moat, mental models were the actual secret sauce. But in this case, trash out the mental models, we will just use LLMs to not just write code, but also think, not build mental models, just straight up outsource thinking.

Writing code is amongst the best way to build mental models, slow, deliberate thinking is what dials down core ideas of anything. This applies to product building as well as programming. Having faced the "friction", and letting mind battle with it is the best way. I recently read a blog post in similar vein, but talking about astrophysics: https://ergosphere.blog/posts/the-machines-are-fine/. It's an excellent read, do go through it.

But yeah, this is my problem, sorry I am not slow, I am just not a ClaudeHead.

[1] I say TPCDS-like cause I remember how badly PlanetScale was thrased in an official blog post when the data generator used by them did not actually comply with TPCDS specifications :upside_down:

Estimating filter equality selectivity using NDVs

Thu, 09 Apr 2026 00:00:00 GMT

Now that I work on databases, I have a habit of keeping up with upstream datafusion PRs. Today I noticed an interesting PR talking about usage of NDVs in equality filter selectivity. I have always been fascinated by NDVs cause my colleagues in planner team always mention them as something super helpful. I started looking into the PR and it turned out to be a small one, but there was a review on it and honestly I did not understand it at all. So I sat down to do some reading on how this works.

Firstly, what are NDVs? NDVs are number of distinct values, these are usually stored at parquet file level. We can also compute NDVs for a column, i.e. how many distinct values a single column contains.

What is filter selectivity? For a filter supplied in a query, number of rows selected by it is called it's selectivity. For e.g. a filter which filters out 50 rows out of total 100 has a selectivity of 50%.

Now lets understand how do they come together, lets say we have a join query like this:

select * from A where A.x = B.y;

In this, we have our join condition as A.x = B.y, if we can predict what will be the selectivity of this filter expression we can make interesting decisions based on it. A good example is: do we want to use partition-wise join or non-partition-wise join? ( i.e. if we have loads of rows to join on, we can distribute them across cores instead of doing it on single core itself )

NDVs help us do exactly that, let's say we have a condition where y = 42. And let's say y column has 5 distinct values, that means our NDV count is 5. As we don't have exact histograms telling us about data distribution, we assume each value is "uniformly distributed" across whole column. For e.g. if y column is made up of {38,39,40,41,42} and has 100 values in total, we assume there are 20 values of 38, 20 values of 39 and so on. This assumption means probability of 42 getting matched is equal to all others distinct values i.e. 1 / 5. If we multiply this with total number of rows in the column, we get selectivity of y = 42 as 20. Here key point is understanding us assuming uniform distribution, if we had histograms, we could exactly tell how many rows have value 42 in the column, but NDVs work as next best case.

This estimation of rows helps in join order estimation, join type estimation, etc.

After understanding this I noticed there was a review comment on the PR and tried to decode that. Review was as follows

I think this new `1 / distinct_count` branch is a little too broad as written. Right now it fires whenever the pruned interval collapses to a single value, but that is not quite the same thing as proving we have an equality filter.

For example, if the incoming stats already describe a singleton interval, or if a conjunction of inequalities narrows the range to one point without actually adding any selectivity beyond the existing stats, we would still scale by `1 / NDV` here and end up under-estimating the row count.

This was a total bouncer for me, it was so high that if this were a cricket match, umpire would call it a WIDE. But let's try to break it down, so author's current condition to use 1/NDV is as follows:

if ...
	target.distinct_count
                    && distinct_count > 0
                    && !target_interval.lower().is_null()
                    && target_interval.lower() == target_interval.upper() {...}

In this interval means zone maps i.e. min/max values of that column. In our case above y would have min/max values as {39,42}.

Condition checks if NDV count is not zero and target_interval's lower value is same as upper value, if everything passes we assume our filter selectivity as 1/NDV.

According to reviewer, 1/NDV estimation is incorrect in following cases:

"if the incoming stats already describe a singleton interval"
"if a conjunction of inequalities narrows the range to one point without actually adding any selectivity beyond the existing stats"

Both of these reviews at the core address the problems of:

shape of data changes as it gets processed by different operators

singleton does not guarantee an equality filter source

Lets try to understand above line with an example. Lets say we have two filters on our y column due to some CTE/subquery etc:

first being: y >= 41 || y <= 42
and second being: y > 33 AND y < 42 (non equality condition)

After first filter we would have:

bounds: [41, 42]
NDV count: 5 (notice it didn't change)

When we come to second filter and apply bounds to predicate we only get rows containing 41. Here we will predict selectivity as 1/NDV. This is the exact problem, lets say out of first filter we get 70 rows out i.e. first filter has selectivity of 70%. Now lets say we have 35 rows of 41 and 35 rows of 42, after applying second filter 35 rows are remaining i.e. 50% selectivity. But, if we go by NDV route, we get 70/5 i.e. 14 rows, that is a super low estimation!

Our NDV count did not change as data flowed through both filters, same phenomenon can happen with different operators in the middle. We also saw that even though we got singleton interval as an

This was an interesting dive, which confused me a lot at different places, even while writing this down!

References:

https://learn.microsoft.com/en-us/sql/relational-databases/performance/cardinality-estimation-sql-server?view=sql-server-ver17
https://github.com/apache/datafusion/pull/20789/
https://blobs.duckdb.org/papers/tom-ebergen-msc-thesis-join-order-optimization-with-almost-no-statistics.pdf

Chinaga Betta Hike

Sun, 08 Feb 2026 00:00:00 GMT

import { Image } from 'astro:assets'; import eucalyptus from '@assets/blogs/chinaga-betta-hike/walk-through-eucalyptus-trees.jpeg'; import verticalWall from '@assets/blogs/chinaga-betta-hike/vertical-wall-to-climb.jpeg'; import clouds from '@assets/blogs/chinaga-betta-hike/amongst-the-clouds.jpeg'; import settlements from '@assets/blogs/chinaga-betta-hike/tiny-settlements-between-hills.jpeg';

Chinaga betta is a nice little day hike bear Bengaluru. It is said to be 2.1 kms one side, so in total of 4.2 kms up and down. It's a simple hike, can be done with family and friends. I recently got to visiti it and I am gonna mention how was the experience. First thing is it needs permit, so book it from arayna vihaara website. Choose a convenient slot, in my time it was just 6 AM to 6:30 AM. This is needed cause it's said that a forest ranger will accompany you to the top, I say it that way, cause surprise surprise there was no one when we reached there.

Trek starts from the base of temple Torana Anjaneya Swami Temple. This is where forest department is supposed to check your IDs before starting. There were a lot of locals when we reached there, its a temple which seemed super active and when we were returning it also seemed they were in the process of sacrificing a goat. We didn't stand back to see that. Well that's for a later bit, first in the start, when we were approaching base of the temple we were very scared as it was pitch black and when we entered forest side, it started to feel like off roading. So driving through a lonely road in night with no one in sight, was a bit concerning, but when we reached there and saw few fellow hikers we were relieved.

We waited for forest ranger till 6:45, but when we realized we were played for a fool, we just started on our own. So yeah, 250 rupees went in vain :(

Okay, next thing, let's talk about the actual trail. We followed trail from this website. I would divide whole hike into four sections, first is big temple to small temple, next is rocky/slaby tiring uphill section, next is flatlands and finally the last remaining part to the summit.

Forest ranger is mostly not needed for the hike's majority, it's just that at the top, there's a vertical rock, which you have to climb and most people would not be comfortable doing that. I was a climber so I climbed it pretty easily (subtle flex xD). Hike starts at the back of the temple, and there's a outward protruding rock at the top, which has a flag above, that is your summit.

First part when we start from the back of the temple, there's a trail which seems to go in the forest, follow that. Once you follow that you will reach another small temple. There will be two paths there, and just as Robert Frost's protagonist, we have to the road less taken. This is first part done, its just light walking, perfect warmup for the next tiring section.

Next section is a where uphill starts, its through mildly dense forest, well it was less denser cause we could see people there had burnt a lot of trees to keep it clear for trail. This section is also where we saw sunrise. It would have been better to watch it from flatlands above, but we were late due to waiting for forest ranger :(

This section is also slippery at times, two of my friends survived the slip, but it could get dangerous, so either wear good shoes or be extra careful where you are stepping and how you are shifting body weight. It shouldn't be a problem for most people, but extra caution is never harmful.

There's a section between flatland and this uphill where you will start seeing eucalyptus trees. They are beautiful with little yellow flowers on them. This gave me a feeling of walking through magic forest of berserk. If locals hadn't burnt a lot of trees, I wonder if I would have declared it Garden of Eden.

Also while going uphill you will notice white arrows, follow them. Though you could also just rely on the trail map you have downloaded, it was on OpenStreetMaps on IndiaHikes website so you could use any FOSS maps app to view it.

Next section is flatlands, it is what the name says. I don't think it's called flatlands in any blog or something, but I am calling it that cause of minecraft biome xD This is one spot you can catch sunrise. Best would be top, but even this is fine. From here you would start getting views of surrounding area. Its serene, breathe in and get ready for the remaining part!

Now, next is walking through some part of flatlands to reach farthermost bottom of last section, there should be a easily visible trail starting there. Just follow that.

As you walk through the last section, you will come across two big rocks creating a narrow space between them. You have to squeeze and pass, that is also very slippery, but all I could think at that point is, if I could climb this chimney 😂

Once you complete that you would reach the final vertical rock which you have to climb. There's a rope there to assist but it seemed we were the first one to reach so it was thrown above!? Not sure why would someone do that. It looks like this: <Image src={verticalWall} alt="Vertical wall to climb" style="width: 30%; height: auto;" />

It doesn't look much but there's no proper footing down below, so if you slip you can get injured. And getting up is one thing, getting down is more scary unless you have someone looking at your feet and telling you where the footholds are.

They are also carved inside the rock, and not projecting outwards. Also part of the reason why when getting down you have to look for them. Well, getting back in our case, I was the guy who got pushed forward to climb to get rope from above. I had no safety, so my friends were a bit concerned but I was confident as I had climbed much higher rocks with much dicier footholds and handholds as compared to this in Hampi. I threw the rope down and assisted everyone else to climb above. As we reached above, we could get a much clearer view of everything around, it's a beautiful place. On one side, you are seeing mountains till eyes can reach, covered with a blanket of clouds. Next side, you see tiny settlements, with lesser hills, perfect for people to make actual small towns around. While it's not a lot of height, wind there was super strong. So if you are lean and light weight, please don't fly off xD

We had a dog guide us the whole time, she was so cute and playful, unfortunately she disappeared when we completed the hike, so we couldn't treat her :( We also could not carry her to to topmost section as that was a climb on a straight rock. It wasn't the biggest but not possible for us to do with a dog.

And yeah that's it, we came down the same path, but there's another surprise waiting for you after hike, we stopped by the Swandenahalli Lake. We think it's a small pond rather than a lake. We chilled there for some time, skipped some stones which itself was good enough to offset lake vs pond disappointment :)

And that's it on the way back we tried Pavithra Idli Hotel's Benne thatte idili, vada and masala dose. They have been cooking since 1942 and are pretty famous, we had to wait for 10-15 minutes to get a seat on a Saturday morning. Benne Thatte idli wasn't upto to the hype for me, I have had better ones near in Jayanagar. But Masala dose was better and we watered everything off with a hot filter coffee, always the best part for me xD It's worth a try once :)

And that's it, enjoy and have a nice trip.

References: https://indiahikes.com/documented-trek/chinaga-betta-trek

Understanding Snapshots in Apache Iceberg

Wed, 12 Feb 2025 00:00:00 GMT

External Link

https://www.e6data.com/blog/apache-iceberg-snapshots-time-travel
https://archive.is/VJ7pE

Sink consistency in RisingWave

Wed, 12 Feb 2025 00:00:00 GMT

NOTE: I am mostly writing this down to present to someone who is already familiar with the system, but I have laid down some ground work to make it slightly better. Write up is also heavily code referential, so sorry if that's not up your alley.

RisingWave is a popular and open-source streaming database, it can work with a variety of different sources and sinks and has capabilities to provide performant real time analyses on streaming data along side service ad-hoc queries. Basically a lot of buzzwords.

I have grown interest into the system and was trying to understand how it prevents data loss with so many different sinks in case one of it's compute node dies? We will be looking into handling of iceberg sink cause that's what I am working with these days. I am going to assume familiarity with iceberg already cause understanding that would take another several blog posts.

One of the good features of iceberg is it's decoupling between data files and metadata files. One can take existing parquet files and create a table out of them easily. Work for the same is active in iceberg-rust. Even when comitting iceberg writers do the same, write data files first and then try to write metadata files, if they fail (they may fail cause another writer's commit would cause ACID guarantees to fail on table) they just have to re-generate metadata files and try to commit again.

So writing data files vs committing are separate processes, same happens in iceberg-rs and hence RisingWave, for iceberg sink these are the locations where each occurs:

Writing happens under IcebergSinkWriter here
Commiting happens here

Getting back to RisingWave, core idea of persisting such state in databases is to use some kind of logs, a lot of databases have their own WAL implementation. RisingWave also leverages concept of log stores for the same. These are the current log stores implementation:

In Memory Log Store
KV Log Store

Now our doubt was what if RisingWave compute node crashes before commit happens. LogStores implements LogReader. LogReader abstraction shows what all methods does it provide, namely:

init
next_item , read next item in log
truncate , increments read offset in log
rewind , decrements read offset in log

These methods are used along side RisingWave's internal global clock to make sure no data is lost. Hierarchy of internal clock looks like this:

barriers every configurable ms, configurable using barrier_interval_ms in system params
checkpoints every N barriers, configurable using checkpoint_frequency system param
commits every N checkpoints, configurable for iceberg sink using commit_checkpoint_interval

So we can keep reading data async on every barrier using next_item and keep truncateing on every commit. This would ensure we lose no data for different types of sinks.

Let's see what happens for iceberg sink:

Firstly, how LogReader relates to our iceberg writer. LogReader is used by LogSinker , in our case DecoupleCheckpointLogSinker here and that finally calls:

For writing: write_batch here
For committing: commit here, this follows central clock of barriers

Now, DecoupleCheckpointLogSinker also listens to central clock of barrier and writes data files to object store on each barrier here (i.e. call close method on data file writer), but it actually commits the result on every N checkpoints.

So technically, if barrier and checkpoint values are not same and a compute node crashes between two checkpoints, we would have written data files to object store, but it would not be committed i.e. no metadata files. So these would fall under table maintenance job of orphan files.

This can be mitigated by simply setting checkpoint_frequency to 1 i.e. trigger at every barrier and also commit_checkpoint_interval to 1 i.e. commit on every barrier/checkpoint.

Now, how to increase batching size? That can be done by configuring barrier_interval_ms . Though this could be a bad idea cause barriers are used internally for a lot of other things, they are like ticks in minecraft engine. So making everything slower for batching can make us lose other system internal state/data leaving system in weird non-recoverable condition.

Lets write a Brainfuck Interpreter: Optimizations

Sun, 26 May 2024 00:00:00 GMT

In last part, we wrote a naive implementation of brainfuck, which is pain-stakingly slow, let's try to optimize it, we will majorly discuss two major optimizations in this blog. We will end up with really nice speedups at the end, so buckle up and let's go!

First optimization

One of the best things about implementing brainfuck is it's implementation is simple and straightforward and hence one can find optimization opportunities realtively easily. We don't try to plot a flamegraph, cause we know most of the time is spent in exec function, that's where we execute all of our operations, so any optimizations done in that flow would give us direct noticeable speedups.

Let's look at implementations of our operands again, this is the core loop: https://github.com/feniljain/brenphuk/blob/6b00f84be79c00679dc28ba917b853ff2e18beea/interpreter.c#L66-L142 right now. There's not much to see, implementations for >, <, +, -, ., , are pretty simple and one liner even :P . So let's have a look at multi-liners i.e. loop implementations, here most hot path would definitely will be finding it's corresponding loop operand, let's say we have a program like:

[:::[::]:::[::[::]::]]
^1  ^2     ^3 ^4

(: here means any random operand), we have 4 loops in total, 1 being the parent loops of all, containing 2 and 3 as their immediate child loop and finally 4 inside 3. Let's say 1 repeats 5 times. In a single interation of loop1 we will be finding end of loop2 once, which would make this find operation happen 5 times. Now let's say loop3 executes 10 times, for loop4 we will execute find operation 10 * 5 = 50 times, this is wasted computation. We can do this computation once and store it for whole execution of program.

So do we make a kind of caching mechanism to store just for the inner loops? Technically we also have to jump for outer loops, so we do need jumping index for them too, but only once for most parent loop, and fewer times for depth one loops. What if we precompute all bracket locations? We as such do it while executing, maybe do it before execution starts, and then reference them to jump easily around. Let's give it a try and see our benchmark results.

We make an array as big as program size and fill it in with -1 values, at exact index of loop operands we will fill in it's corresponding loop operands index. So we create two arrays: open_brackets_loc and close_brackets_loc. Now just before entering the core loop of exec we call a new function called fill_brackets_loc, this takes in program and it's length and calculates all brackets location along with filling them in our arrays. Implementation is simple, we find a [ and maintain a counter till we find corresponding ], same as what we did in last blogpost, but we will only do it once this time, at the very start. Code looks like this:

void fill_brackets_loc(char *prog, int prog_len) {
  int i = 0, next_open_bracket_loc = -1;

  while (i < prog_len) {
    switch (prog[i]) {
    case '[': {
      int brackets_depth = 0;
      for (int j = i; j < prog_len; j++) {
        if (prog[j] == '[') { // found a new loop start operand
          if (next_open_bracket_loc == -1 && j != i) {
            next_open_bracket_loc = j;
          }
          brackets_depth++; // increase the counter
        } else if (prog[j] == ']') { // found a new loop end operand
          brackets_depth--; // decrease the counter
        }

        if (brackets_depth == 0) {
          open_brackets_loc[i] = j; // filling in our arrays
          close_brackets_loc[j] = i;
          break;
        }
      }

      if (brackets_depth != 0) {
        ABORT("brackets mismatch"); // oops didn't find corresponding loop operand
      }

      break;
    }
    default:
      break;
    }

    if (next_open_bracket_loc != -1) {
      i = next_open_bracket_loc;
      next_open_bracket_loc = -1;
    } else {
      i++;
    }
  }
}

We can even do one better by also storing each [] identified when transversing nested loops. But for now, this works :P

Now our [ handler in exec looks like this:

case '[':
  if (tape[pointer] == 0) {
    int idx = open_brackets_loc[i];
    if (idx == -1) {
      DBG_PRINTF("[: got bracket_loc as -1 for i: %d", i);
      ABORT("invalid state");
    }
    i = idx;
    continue;
  }

  break;

We directly look up the location of corresponding loop operanding and jump!

same for ]:

case ']': {
  if (tape[pointer] != 0) {
    int idx = close_brackets_loc[i];
    if (idx == -1) {
      DBG_PRINTF("]: got bracket_loc as -1 for i: %d", i);
      ABORT("invalid state");
    }
    i = idx;
    continue;
  }

 break;
}

Running our benchmarks now gives us: Factor: ~7s Mandelbrot: ~22s

That's some big gains from a simple observation! But wait we have more :)

Second optimization

Before this occurs, I made a small change to our core exec, instead of using characters I am using enum variants for identifying each character, it's essentially the same thing as before just different representation. For the coversion between character operations and enum variants I wrote a simple parse function:

enum Op_type {
  INVALID = 0,
  FWD,
  BWD,
  INCREMENT,
  DECREMENT,
  OUTPUT,
  INPUT,
  JMP_IF_ZERO,
  JMP_IF_NOT_ZERO,
};

void parse(char *prog, int prog_len) {
  int i = 0;

  while (i < prog_len) {
    enum Op_type op_type = INVALID;
    switch (prog[i]) {
    case '>':
      op_type = FWD;
    case '<':
      if (op_type == INVALID)
        op_type = BWD;
    case '+':
      if (op_type == INVALID)
        op_type = INCREMENT;
    case '-': {
      if (op_type == INVALID)
        op_type = DECREMENT;
      break;
    }
    case '.':
      op_type = OUTPUT;
      break;
    case ',':
      op_type = INPUT;
      break;
    case '[':
      op_type = JMP_IF_ZERO;
      break;
    case ']':
      op_type = JMP_IF_NOT_ZERO;
      break;
    default:
      break;
    }

    if (op_type == INVALID) {
      i++;
      continue; // this can happen when there are comments which are supposed to
                // be ignored
    }
      i++;
  }
}

Now an interesting optimization I have seen done in Bytecode Interpreters is combining instructions when they occur together way too often. This could happen with same or different instructions too. I learnt about this first time while completing (Crafting interpreters)[https://craftinginterpreters.com/] an amazing book by Bob Nystorm. So let's try to find if it is possible to combine any instructions in our case. We add an array with size of [number of instructions][number of instructions]. This is because we want to check how each instruction relates with other ones. At the end of parse function we add this code to make it record op_assoc:

ops[++ops_len] = op;
if (ops_len > 0) {
    // This logic simply tries to unite op_assoc[1][5]
    // and op_assoc[5][1] into one single field
    int op_type_1 = (int)ops[ops_len].op_type;
    int op_type_2 = (int)ops[ops_len - 1].op_type;
    if (op_type_1 >= op_type_2) {
      op_assoc[op_type_2][op_type_1]++;
    } else {
      op_assoc[op_type_1][op_type_2]++;
    }
}

We try to unite results of form op_assoc[i][j] and op_assoc[j][i] into one field op_assoc[i][j], cause we don't want to see associativity of + and [ and [ and + as separate results. With this done, let's try to get output of it for mandelbrot:

DEBUG: op_assoc[1][1]: 3506
DEBUG: op_assoc[1][3]: 438
DEBUG: op_assoc[1][4]: 337
DEBUG: op_assoc[1][5]: 3
DEBUG: op_assoc[1][7]: 498
DEBUG: op_assoc[1][8]: 568
DEBUG: op_assoc[2][2]: 3604
DEBUG: op_assoc[2][3]: 386
DEBUG: op_assoc[2][4]: 246
DEBUG: op_assoc[2][5]: 3
DEBUG: op_assoc[2][7]: 362
DEBUG: op_assoc[2][8]: 521
DEBUG: op_assoc[3][3]: 224
DEBUG: op_assoc[3][7]: 30
DEBUG: op_assoc[3][8]: 86
DEBUG: op_assoc[4][4]: 2
DEBUG: op_assoc[4][7]: 462
DEBUG: op_assoc[4][8]: 133
DEBUG: op_assoc[5][7]: 1
DEBUG: op_assoc[5][8]: 1
DEBUG: op_assoc[7][7]: 10
DEBUG: op_assoc[8][8]: 32

Highest oens are (2, 2), (1, 1), so repeating instructions, specifically >, <, these should be easy to club. Let's do just that, we will add a repeat field for each operation which will store how many times does the operation repeat. After this we can make exec function increment values by repeat's value instead of just 1, after this change our exec function looks like this:

int exec(char *prog, int prog_len) {
  DBG_PRINT(prog);
  int i = 0, val;

  parse(prog, prog_len);
  // print_op_assoc(); // This is for checking which all ops occur together
  fill_brackets_loc();

  while (i <= ops_len) {
    // start = clock();
    switch (ops[i].op_type) {
    case FWD:
      pointer += ops[i].repeat; // We increment by `repeat` now
      break;
    case BWD:
      pointer -= ops[i].repeat; // We increment by `repeat` now
      break;
    case INCREMENT:
      val = (int)tape[pointer];
      val += ops[i].repeat; // We increment by `repeat` now
      tape[pointer] = (char)val;
      break;
    case DECREMENT:
      val = (int)tape[pointer];
      val -= ops[i].repeat; // We increment by `repeat` now
      tape[pointer] = (char)val;
      break;
    case OUTPUT:
      printf("%c", tape[pointer]);
      break;
    case INPUT: {
      char ch = (char)getchar();
      tape[pointer] = ch;
      break;
    }
    case JMP_IF_ZERO:
      if (tape[pointer] == 0) {
        int idx = open_brackets_loc[i];
        if (idx == -1) {
          DBG_PRINTF("[: got bracket_loc as -1 for i: %d", i);
          ABORT("invalid state");
        }
        i = idx;
        continue;
      }

      break;
    case JMP_IF_NOT_ZERO: {
      if (tape[pointer] != 0) {
        int idx = close_brackets_loc[i];
        if (idx == -1) {
          DBG_PRINTF("]: got bracket_loc as -1 for i: %d", i);
          ABORT("invalid state");
        }
        i = idx;
        continue;
      }

      break;
    }
    case INVALID:
      ABORT("INVALID shouln't have leakded till here, there's a bug in parsing "
            "code");
    default:
      break;
    }

    i++;
  }

  return 0;
}

Simple and easy, let's benchmark this change:

Factor: ~2.16s Mandelbrot: ~5.9s

And we get another round of massive speedups! Whole code is available at: https://github.com/feniljain/brenphuk/tree/attempt_3

This is where halt our efforts for optimizations, next we are going to learn about JITs from systems perspective, how do we leverage kernel APIs to achieve JITting.

Lets write a Brainfuck Interpreter: Naive Implementation

Tue, 21 May 2024 00:00:00 GMT

This is a series where we will slowly climb up to building a JIT for a brainfuck compiler. This is the first blog in the series covering the language and a naive implementation. We try to understand everything from first principles, so buckle up and let's get started!

Understanding Brainfuck Language Operators and Spec

Brainfuck is a super simple language which takes the idea of a turing machine and implements it as a programming language. That means we have a tape, an array of cell where each cell contains a number, and we just move around on it operating on numbers.

      ----------------------------------------
Tape: |10||65||0||0||45||14||0||0||0||0||0||0|
      ----------------------------------------
                          ^
                          |
      Pointer -------------

Let's look at all the operators:

> : move right to next cell on tape
< : move left to previous cell on tape
+ : increment the value of current cell
- : decrement the value of current cell
, : take input from user and store it in current cell
. : output value stored in current cell to user
[ : jump to matching ], if value is zero
] : jump to matching [, if value is not zero

And that's it, this is the whole language, suprisingly simple right xD

There are a few properties we haven't discussed yet, they are more like implementation details, for e.g. how long should the tape be? what to do if you cross the max size of tape? what should be initial value of cell? These things are outlined in compelte detail in this spec: https://github.com/sunjay/brainfuck/blob/master/brainfuck.md

Small Brainfuck Programs which we will use in tests of our interpreter

As we know about all operators let's try getting our hands dirty and write few small programs, this also would give us an additional benefit of having test cases ready for our interpreter. We can build incrementally harder programs and use them for our interpreter, this way we also add a nice incremental debugging test suite.

Super simple program: +++ , it just adds 1 to first cell thrice, so our tape would look like:

       ----------....
 Tape: |3||0||0||
       ----------....

Next one: ++-, add 1 to first cell twice, and then subtract 1 from it.

       ----------....
 Tape: |2||0||0||
       ----------....

Let's use < and > operators now: ++>+<-, this program first adds 1 twice to first cell, then shift to second cell, adds 1 over there, comes back to first cell and does a subtract operation. Our tape is now:

       ----------....
 Tape: |1||1||0||
       ----------....

Moving to next operators , and .: ,+., this program takes input from user, stores it in first cell, adds one to it and outputs it. Let's say we pass 65 when prompted for input, our tape would look like:

       ----------....
 Tape: |66||0||0||
       ----------....

and our output would be: B, we print ascii representations of numbers stored in cell, that's also how we get hello world too later down the road xD

Side Tip: Can't remember what ascii code represents what character? Don't worry there's a man page for it, just run: man ascii!

Now comes the l👀ps: [++], this programmm: does nothing :P . [ operator says jump to corresponding ] when current cell is zero, and by default all cell values are zero, so our program jumped to last operator of program and exited.

Okay, let's do something serious this time: +++[-] . It's a simple program, we first increment first cell to value three, next we start a loop, this time it won't jump cause we have value 3 in there, in first iteration it will decrement value by one, i.e. to 2. we then have ] which jumps to corresponding [ if cell has non-zero value, so it goes back to [ and second iteration starts where decrement happens and again jump happens, this continues till zero and at that point ] sees a zero value at it's cell and exits the program.

After exec of +++:

       ----------....
 Tape: |3||0||0||
       ----------....

After first iteration, we exec 4th, 5th and 6th operator in program here:

[ -> value is 3, don't jump, go to next instruction - -> decrement value to 2 ] -> value is 2, non-zero, jump to [

       ----------....
 Tape: |2||0||0||
       ----------....

After second iteration, we again execute 4th, 5th and 6th operator:

[ -> value is 2, don't jump, go to next instruction - -> decrement value to 1 ] -> value is 1, non-zero, jump to [

       ----------....
 Tape: |1||0||0||
       ----------....

After third iteration, we again execute 4th, 5th and 6th operator:

[ -> value is 1, don't jump, go to next instruction - -> decrement value to 0 ] -> value is 0, don't jump, go to next instruction i.e. program end

       ----------....
 Tape: |0||0||0||
       ----------....

This seems tedious to do it by hand right? No probs some humble person on internet made this for us: https://arkark.github.io/brainfuck-online-simulator/

Visualization of brainfuck programs, really helpful for debugging!

Okay, now let's use loops, cell movement, etc together: >+++++++++[<++++++>-]<...>++++++++++. , at this point you should try to do some brain job and figure it out on your own.

Final state of tape:

       ----------....
 Tape: |54||10||0||
       ----------....

Having difficulties understanding? Try the online playground I linked above and iterate slowly on each operator, you should be able to figure it out. (hopefully :P just kidding xD)

Okay, one last program and we are done, I just couldn't skip this program:

++++++++[>++++[>++>+++>+++>+<<<<-]>+>+>->>+[<]<-]>>.>---.+++++++..+++.>>.<-.<.+++.------.--------.>>+.>++.

this is is is is:

"hello world"

lessggoo we did it! We have reached hello world finally. It's a good exercise to think about execution here too, I think spec itself does a good job at trying to explain it, so it's better to give that a try: https://github.com/sunjay/brainfuck/blob/master/brainfuck.md#hello-world-example

C build system setup

Okay, enough brain jog, time to get hands dirty with actual interpreter implmentation, but before we start I want to clear up few things. First, we are going to implement this interpreter in C, and I am not that good at writing idiomatic C code, so if you find some weird way of doing things, that's just me being noob :P . And next thing is, we will be using meson as our build system. You can change the build system as per your convinience, I am not doing some rocket science with it, so shouldn't be hard to port from any to any.

I am going to dump the whole file at once, it's not much and easy to understand:

project('brenphuk', 'c',
  version : '0.1',
  default_options : ['warning_level=3', 'default_library=static'])

readline_dep = dependency('readline').as_system()

# source: https://github.com/tiernemi/meson-sample-project/blob/master/meson.build
# This adds the clang format file to the build directory
configure_file(input : '.clang-format',
               output : '.clang-format',
	       copy: true)

run_target('format',
  command : ['clang-format','-i','-style=file', ['../interpreter.c']])

run_command('clang-format','-i','-style=file', 'interpreter.c', check: true)

executable('brenphuk',
           'interpreter.c',
           install : true,
		   c_args: ['-Werror', '-Wall', '-Wextra', '-Wshadow', '-Wconversion',
					'-Wcast-align', '-Wunused', '-Wpointer-arith', '-Wold-style-cast',
					'-Wundef', '-Winit-self', '-Wredundant-decls', '-Wmissing-include-dirs',
					'-Wswitch-default', '-Wswitch-enum', '-Wfloat-equal', '-Wformat-security',
					'-Wpedantic',
					'-g'],
           dependencies : [readline_dep],
)

We set our project name as brenphuk, set readline as a dependency we need, we will be using that for REPL mode. Then we set up some formatting commands, code should look good always :) . Finally we define what our executable will be called and what files to use to make it, with some c-args, addded a bunch of them just for more strictness and help me not make mistakes, finally we pass readline dependency to executable to be built together.

Implementation

For main impl, what we want to do is, take input from user, go over character by character and perform operation specified by operand mentioned on that index. Let's call this core function of ours as exec, which will accept an engine struct, this struct contains our actual tape and pointer:

typedef struct {
  char tape[TAPE_SIZE];
  int pointer;
} engine;

Along with engine, it will also accept a string, the program itself given as input from user.

int exec(engine *eng, char *prog) {
  size_t prog_len = strlen(prog);
  size_t i = 0;
  while (i < prog_len) {
    switch (prog[i]) {
    }
  }
}

Let's start adding impl of operations now, for <, we just want to increment engine->pointer, so it's simple as:

case '>':
  eng->pointer++;
  break;

and it's opposite >:

case '<':
  eng->pointer--;
  break;

Similarly for + and -, we want to increment value in tape on the index pointer:

case '+':
  eng->tape[eng->pointer]++;
  break;
case '-':
  eng->tape[eng->pointer]--;
  break;

For ,, we want to take input and store it in currently pointed cell

case ',': {
  char ch;
  scanf("%c", &ch);
  eng->tape[eng->pointer] = ch;
  break;
}

and for ., we want to output currently pointed cell's value

case '.':
  printf("%c", eng->tape[eng->pointer]);
  break;

Now comes the slightly more interesting ones, loop constructs. For [ and ], we want to jump to corresponding loop operand on certain condition. On finding a zero on [ operand, we have to find corresponding ] operand, as the loops can be nested we have to keep a count of loop constructs we have seen, so we maintain a counter, and increment it whenever we see a [, and decrement when we see a ]. When we complete at the same number we started our counter with, we have reached corresponding ]. Impl for [, will look like:

case '[': {
  if (eng->tape[eng->pointer] == 0) {
    int brackets_depth = 0;
    while (i < prog_len) {
      if (prog[i] == '[') {
        brackets_depth++;
      } else if (prog[i] == ']') {
        brackets_depth--;
      }

      if (!brackets_depth) {
        break;
      }

      i++;
    }

    if (brackets_depth != 0) {
      ABORT("could not find matching closing square bracket");
    }
  }

  break;
}

here we use bracket_depth to keep track of the counter we discussed above, we start brackets_depth as 0, so at the end of transversing whole program if brackets_depth is not zero, we print out brackets mismatch error. Slightly, different impl for ]:

case ']': {
  if (eng->tape[eng->pointer] != 0) {
    int brackets_depth = 0;
    while (i > 0) {
      if (prog[i] == '[') {
        brackets_depth--;
      } else if (prog[i] == ']') {
        brackets_depth++;
      }

      if (!brackets_depth) {
        break;
      }

      i--;
    }

    if (brackets_depth != 0) {
      ABORT("could not find matching opening square bracket");
    }
  }

  break;
}

And that's it, ofc we do need supporting code in main and things around repl, I am not covering them in blog cause they are mostly irrelevant and easy to do, still just to whole thing together, you can check: https://github.com/feniljain/brenphuk/tree/attempt_1

This implementation also contains a benchmark suite, this will be helpful when comparing results of our different approaches in sections further. Benchmark prorgams are present in https://github.com/feniljain/brenphuk/tree/attempt_1/programs , when we run above program with -O3, we get these execution times:

Factor: ~24-25s Mandelbrot: ~69-70s

Resources:

I have collected few brainfuck resources to help here: https://github.com/feniljain/knowledge-base/blob/main/programming-languages/brainfuck/README.md

Faster shell boot times

Fri, 19 Jan 2024 00:00:00 GMT

Optimizing My Shell Startup Times

I was going through Thorsten's latest blog about faster startup times, and he talks about shell startup times, here is the direct link for curious: https://registerspill.thorstenball.com/p/how-fast-is-your-shell . This made me wonder how fast is my shell config, and work on my shell load times to at least get them in a bareable range.

So to start we find how to measure our load times, well Thorsten covers it nicely with this simple one liner:

time zsh -i -c exit

and one data point is not reliable, so let's run it 10 times:

for i in $(seq 1 10); do time $SHELL -i -c exit; done

I felt, looking at 10 results wasn't as good, so here's a script which gives you average for one of the columns:

#!/bin/zsh

for i in $(seq 1 10); do
  2>&1 time $SHELL -i -c exit
done | awk '{sum += $11} END {print "Average:", sum/NR}'

(Note this averages on total)

Cool, this looks good, well I hoped I could say the same for my shell startup times 😬. They were in order of ~600ms, that is very bad, super heavy leaded shoes in terms of original article :( Thankfully it also links to some amazing articles, most notably: https://htr3n.github.io/2018/07/faster-zsh/. This is an amazing article which I referred through whole of my process, I did not end up implementing all the tricks because I wanted to keep all my config in .zshrc and also not much long.

First tip is about profiling zsh, and that is done using zprof, we can enable it by pasting following line in .zshrc:

zmodload zsh/zprof

or running zmodload zprof directly in shell (note the difference of zsh/). This gives us places where zsh spends most time in, and hence our data points to start optimizing. But before diving straight into checking profiling output, I decided to checkout few low hanging/no-brainer things I can sort myself out. This was easier to do considering I don't maintiain my .zshrc as avidly.

To start, I had to figure out what all config files does zsh even consider, thankfully htr3n's article covers sequences of config files loaded and hence also listing them. So I started checking those files and found out unused nix, orbstack, etc. exports. First low hanging fruits spotted! Next moved onto .zshrc and cleaned up unused/old/irrelevant exports.

Few of the time taking processes are, eval calculation and subshell spawning, if you are a mac user and have something like: brew --prefix in your config, that is an indication of subshell spawning. To optimize this, we can run given command manually and then paste the output in place of subshell spawning code. One thing to note is, this is a double-edge sword, as any changes in install locations from brew in future would cause breakages for us, so do think about it before applying. Well I applied them and this instantly causes my startup time to reduce by half i.e. we reach 300ms territory.

For our next target, article mentions version managers like rvm and nvm are super heavy and contribute a lot to startup times. That's easy, just stop using them? Right? Nope, we introduce some indirection. The godly trick which solves and breaks almost everything in computers. In this case, we define an additional function with same name as version manager, let's say nvm and move all the nvm required load code in there, and finally call nvm at last. In my case it ended up looking like this:

function nvm() {
    export NVM_DIR="$([ -z "${XDG_CONFIG_HOME-}" ] && printf %s "${HOME}/.nvm" || printf %s "${XDG_CONFIG_HOME}/nvm")"
    [ -s "$NVM_DIR/nvm.sh" ] && \. "$NVM_DIR/nvm.sh"
    if [[ -e ~/.nvm/alias/default ]]; then
      PATH="${PATH}:${HOME}.nvm/versions/node/$(cat ~/.nvm/alias/default)/bin"
    fi
    [ -s "$NVM_DIR/bash_completion" ] && \. "$NVM_DIR/bash_completion"  # This loads nvm bash_completion

    # invoke the real nvm function now
    nvm "$@"
}

What this results in is a neat trick, where whenever we call nvm for the first time, this function gets called and hence things gets loaded for the first time. So it does not block and increase load times on startup. This reduced my startup times to ~100ms!

Some amazing progress for small and straightforward changes.

Now for another small win, we disable auto-updates using:

DISABLE_AUTO_UPDATE="true"

This brought us to ~80ms. Next I removed battery and git plugins, this shaved off 10ms more, so ~70ms. This is also a good time to talk about those fancy shells, if you someone who has a fancy shell which shows git version, nvm version, battery level, neighbour's mom's number, then you will suffer from relatively higher load times always, cause of the many computations shell needs to do in background to fill up those fancy prompts. I personally moved off them much before and now use simple robbyrussell theme, which is super minimal and does not get in my way or try to occupy more space than necessary.

After this, according to zprof most of zsh's time is spent on autocompletions, and in turn compinit, etc. components, there are some really nice tricks to optimize these components further, for e.g. compinit tries to read a cache files everytime it is invoked, people have observered that's not necessary and can be reduced to once per day, this gave some more small wins. After a while there comes a plateau in optimizing, where you are in a land in which moving the needle to your favour becomes increasingly harder and harder, this is that territory. Well maybe that's not quite true, cause one of the big optimizations which can be done is removing oh-my-zsh, I tried that and it caused my load times to go down to ~20ms. The thing is I don't want to move off omz yet. I tried to find some alternatives but wasn't as happy, honestly didn't spend much time looking too, so if I do find a better solution will update this article! Just as a side note one of the resource I am planning to check out was this gist: https://gist.github.com/laggardkernel/4a4c4986ccdcaf47b91e8227f9868ded, it was linked in comments of same article (aint it a gold mine xD).

Another small thing which can help in speeding up very first shell load (i.e. after a fresh boot), is to remove ASL logs, these are apple logs which on some systems can grow to be huge. One can prune them easily using:

cd /private/var/log/asl/
ls *.asl
sudo rm !$

Read more about its benefits/losses in this article: https://osxdaily.com/2010/05/06/speed-up-a-slow-terminal-by-clearing-log-files/

I tried a few more things, like replacing hack mentioned in this article: https://coderwall.com/p/sladaq/faster-zsh-in-large-git-repository, to reduce git branch computation time, tho it wasn't that bad for me, so I skipped this one also especially because this required changing a omz lib file. Well at last this was the place where I stopped trying to do any more complex more code optimizations and rested my case for this time. (we will pick it up again for sure!)

If you would like to go through all of my changes together, this is the commit: https://github.com/feniljain/dotfiles/commit/202e22baee2164e4c38e18eb88db7a1b920f84c6#diff-d30bab601f4597c635d0bd4915f3475c4c22170a538d6781cd086bdfe100961fL5