States are represented by nodes in the graph. In other words, a state is a node in the graph. So, I don't think there's any difference, but some authors maybe make a difference (e.g. because you don't need to store all information associated with some state while searching).
BFS explores by level (hence the name breadth, in contrast to depth-first search, which goes deep and then backtracks), in the sense that it will first explore all nodes that are $w$ away from the start node before exploring all nodes that are $w + 1$ way, if you think of the edges of the (unweighted) graph as having a weight of 1, which you can do.
BFS indeed needs to store a frontier in memory, which is a FIFO data structure (i.e. you first remove from the frontier the oldest node there in constant time) of nodes we can explore next.
If you use BFS to find some path, you need some way to compute or keep track of it. If you apply BFS, you can generate a tree, where each node has a parent, which can be used to retrieve the path, i.e. you just backtrack from the goal.
Note that graphs can have cycles and you can get to a node in different ways. So, you also need to mark a node as explored in some way, which may be with some variable you associate with the state/node (CLRS approach) or with an explored set (AIMA approach), which may thus contain all nodes, which makes BFS an instance of graph search (see also this).
The running time also depends on how you represent the graph (e.g. adjacency list and adjacency matrix) or how you analyse the algorithm. See also the CLRS book for a different perspective on this topic.