Database access with functional programming

Question

I'm interested in becoming more familiar with functional programming as a paradigm, so I'm trying to introduce a more functional style to some of my projects. I'm struggling to understand how to handle side effects with a database.

I have some functions that kind of look like this:

                 db query
                     +
db query             |
   +                 |
   |                 v
   |      +--------->a()
   v      |
f(type)+--+
          |
          +--------->b()

The trouble is that both f and a are non-pure functions because they need to do database queries. I've seen some functional projects that work by having all the state in a single place and the rest of the application takes bits and pieces of state as function parameters. I can replicate something like that here by putting all the queries in f for example, but since b doesn't need the database queries used by a, this would be really inefficient.

Is there a pattern for handling database access in functional programs?

How inefficient it can be if you load all required data and pass it to f as a parameter? In most of the cases it is not significant, especially in cases where you can load data asynchronously. — Fabio, Feb 06 '21 at 19:14
Does this answer your question? How does persistence fit into a purely functional language? — gnat, Feb 06 '21 at 19:23
What do you mean by "query?" I wouldn't consider a SELECT statement to be a side effect, but an INSERT certainly would. — Greg Burghardt, Feb 06 '21 at 20:03
@GregBurghardt, I think by "side effects" OP mean all kinds of access to external resources(database) which can return "unpredictable" result. — Fabio, Feb 06 '21 at 20:35
It's worth noting that SQL itself reflects a functional paradigm, and is addressed to the same sorts of problems. A SELECT statement is often a pure function of the database, and the transaction control system prevents "side effects" from one transaction being interwoven with the processing being performed in another. — Steve, Feb 06 '21 at 21:58
@Steve, sorry i would say about "pure function' as function which for same input always return same output. SQL SELECT statement returns data from the file system, which can be updated between select statements. — Fabio, Feb 06 '21 at 22:03
@Fabio, a select statement does always return the same output for the same input. I grant you that the tables can be updated between calls of the same statement, but similarly, the parameters of a function can also be changed between each call. The transaction control system can also ensure that, even if the tables are updated from elsewhere, a second call of a given select statement, occuring within a single transaction, continues to see the same data as the first call. I don't see a significant difference here, other than how the "inputs" are represented syntactically. — Steve, Feb 06 '21 at 22:29
@Steve it's not a Pure Function if the same arguments don't cause the same return every time. This is called Referential Transparency.The state of the database or even the time of day can't change the result a Pure Function if they aren't explicitly passed in. Thus a Pure Functions parameters show you exactly what it depends on. A Pure Function doesn't hide those dependencies inside. So no sneaky calling SELECT from inside a Pure Function. — candied_orange, Feb 07 '21 at 19:18
@candied_orange, the same arguments to a select query do cause the same return every time. The state of the database is explicitly passed in - that's what the table names are! In relational algebra, the tables are literally the operands to an operator (i.e. the parameters to a method call). That's why I said the only distinction here is syntactical. The dependencies are not hidden inside - the query statement describes the inputs and the operators, and the database engine produces an output that is effectively a pure function of those inputs. (1/2) — Steve, Feb 07 '21 at 19:48
I concede that most databases are not in fact pure in this respect, and have a variety of stored settings and so on that can affect results independently of the inputs described in the query statement, but so do ordinary functional languages. The point is that the principles being employed here are identical. Also, whilst the tables are not immutable, there is no reason as inputs why they should be immutable across separate calls of the same query. Parameters in functional languages are not expected to be immutable beyond the lifetime of the call. (2/2) — Steve, Feb 07 '21 at 19:53
@Steve Referential Transparency means you can predict (and even cache) the return so long as the arguments are then same without caring about the state of anything else, database or otherwise. Change can only come in through arguments. And "a select statement does always return the same output for the same input." is just wrong. That isn't how it works. A select statement reveals database state. The return of a Pure Function is expected to not change, well beyond the lifetime of the call, unless the arguments change. — candied_orange, Feb 07 '21 at 20:10
@candied_orange, I repeat, the tables are the arguments in SQL. You're right that a very rudimentary select may do nothing but return it's input (the table referenced in the from clause) and "reveal the database state" - that's because the SQL language requires a select clause to be present in the query, even when it represents no additional operation being applied to the input. The return of a select query will basically never change across calls, unless it's arguments (the tables and their contents) change. You need to grasp that crucial bit: the tables are the arguments. — Steve, Feb 07 '21 at 20:53
@Steve doesn’t mater if they are. That still isn’t referential transparency. — candied_orange, Feb 07 '21 at 21:09
@Steve, Today SELECT COUNT(*) FROM MyTable returns 99. Next week, same query(with same arguments as you said) SELECT COUNT(*) FROM MyTable will return 120. With pure function sum(12, 30) returns 42 today and next week it will also return 42. With pure function I can save final result and never call this function anymore for arguments 12 and 30, but I can not do it for SQL statements, I need to execute actual sql statement again. — Fabio, Feb 07 '21 at 21:28
@candied_orange, the table references - at least within a single select statement - are unquestionably transparent. Multiple references to the same table within a select statement, all see the same copy of that table. And the references can be made automatically transparent across multiple select statements within a single transaction, by setting the transaction control system to the appropriate mode. — Steve, Feb 07 '21 at 21:31
@Fabio, but MyTable represents 99 rows in the first case, and 120 rows in the second. It's no different from having sum(a, b), where the values of 'a' and 'b' are changed between calls - being 12 and 30 the first time, then 40 and 55 the next week. Similarly, if you provided a literal table in the from clause (in the same way as you provided literal integers to the sum operator, and instead of providing a table reference) then the result would always be the same and could never change until the hard-coded literals were changed. — Steve, Feb 07 '21 at 21:39
@Fabio, just for clarity, a table literal would be represented in SQL (SQL Server, that is) something like SELECT SUM(i) FROM (VALUES (12), (30)) AS x(i). This will always return a scalar value of 42, but it does so by constructing an input table, then summing the i column. Similarly, a COUNT(*) would always return a value of 2 (that being the count of rows within the table literal). These results will not change week to week, because the inputs are hard-coded. — Steve, Feb 07 '21 at 21:48
@Steve, internally yes, but may be not from the database consumer perspective, who sends "SELECT COUNT(*) FROM MyTable" string to the database and receives different results at different time, which is the case for OP's example. — Fabio, Feb 07 '21 at 22:31
@Fabio, that's an interesting observation, but I don't see how it bears on the question of whether SQL reflects a functional paradigm. The database consumer gets different results at different times, simply because he refers to input tables whose contents have changed between each call. If he puts different data into the query, he gets different results out - that's entirely consistent with a query being a "pure function" of those inputs. (1/2) — Steve, Feb 07 '21 at 23:04
That the database consumer does not supply the data to the server is immaterial - he supplies the reference to the data already on the server, which forms the inputs to the query. This is exactly as a client-side functional program would refer indirectly to files or data sources around the entry point of the program (and quite probably stored on a network server) - it wouldn't pipe every piece of data in through the command line or hard-code every value. (2/2) — Steve, Feb 07 '21 at 23:11

Fabio · Answer 1 · 2021-02-06T20:32:15.020

Extract important business logic out of application dependencies and "wrap" it with database calls or calls to other external resources.

-- Load data
-- Process data
-- Save data based on processed result

Code structure will look like below:

# --> logic with side effects
data = load_data_from_external_resources()

# --> pure logic without side effects
result = business_logic.process(data) 

# --> logic with side effects based on result
if result.has_records
  save(result.records)
end

if result.should_notify_others
  notify_others(result.message)
end

With such approach business logic will be totally independent of application technologies(database, file system or third parties) in design and runtime.

For very complicated cases where loading all required data is not possible because of performance issues we can introduced small bits of business logic and "stack" layers of application and business logic on each other

# --> load minimum required data
data = load_minimum_data()
--> pure logic without side effects
next_steps = business_logic.determine_next_steps(data)
if next_steps.need_more_data?
--> load more data
data = data + load_more_data()
end
--> pure logic without side effects
result = next_steps.process(data)
--> logic with side effects based on result
if result.has_records
  save(result.records)
end

Downvoter, please explain what is wrong with the answer, I will be happy to fix or delete it. — Fabio, Feb 07 '21 at 17:50
This kind of approach is often called “functional core, imperative shell”. Such an architecture is a great fit for many problems, and corresponds nicely to concepts such as hexagonal architecture or onion architecture. — amon, Feb 07 '21 at 19:49

score 6 · Answer 2 · answered Feb 06 '21 at 18:36

f only needs to do a database query because it needs a which needs a database query.

a only needs to do a database query because no one has passed it what the result of that query would be.

Solution: pass a what it needs.

Side effects exist in every useful program. Otherwise you'd see nothing on the screen. The issue is where they exist. Programming functionally isn't eliminating all side effects. It's being formal with them. Only permitting them to exist in well controlled places away from all your interesting, needs testing and debugging, business logic. That's where you keep your pure functions. Move side effects to boring structural code that people understand at a glance.

The biggest impact here is that a no longer controls when the database is queried. But if you want a to be pure that's the cost it comes at.

Database access with functional programming

2 Answers2

--> pure logic without side effects

--> load more data

--> pure logic without side effects

--> logic with side effects based on result