You pretty much summed it up. An infrared seeker is much less sophisticated than an imaging one.
For an IR seeker, it does exactly just that, it will seek out contrasting IR sources against a programmed level of background IR radiation. Keep in mind that essentially everything is an IR emitter. Contrasting does exactly that: to discern degrees of emission. The next step is to decide what we want to do with that discernment: Focus or Ignore.
Here is what a typical IR sensor sees...
Thermal Infrared
We can see there are great contrasts in each IR image. So great that it require much more imagination or creativity from our minds to call an image 'cat' or 'house' or 'car'. The more inferior the sensor, the less our ability to interpret the image as 'cat' or 'house' but simply high points of infrared emission sources.
To actually 'image' something, meaning to create what the human mind can understand, is much more complex in terms of sensor sensitivity and how to process the contrasts. The process is similar on how to create gray scalings of a color photograph. In imaging,
NOTHING is ignored. But in order to create an image we must be able to have those discernment as fine grain between degrees of emissions as possible. In other words, there is a difference a system that can discern between 1 deg and 5 deg in one degree increments:
1 - 2 - 3 - 4 - 5
...Versus a system that can discern between 1 deg and 5 deg in .005 degree increments:
1 - 1.005 - 1.010 - 1.015 - 1.020 - 1.025 - 1.030 - and so on to 5.
The latter is clearly a much more 'fine grain' sensor system and superior in terms of being able to create an image that our human minds can see without resorting to imagination or guessing.
The issue is quite philosophical and this is where the word 'imagination' can be misleading. When we look at an object's discrete elements, such as a 'close-up' of a face to the man's ear, we can only recognize what we see as an 'ear'. We can 'imagine' the face to be of a famous politician or a movie star. That is 'imagination'. But when we 'zoom out' and see other elements, we can immediately recognize the politician or the movie star himself, based upon previously known complete images of him. We no longer need to exercise our 'imagination'.
Infrared imaging is the same in philosophical terms. Infrared is an emission element from an object. Visible wavelengths emission is another element. Radar reflection is another. Audio is another. Physical sensory inputs from physical contact is another. Obviously, for a flying aircraft we can rule out some elements and that leave us with visible wavelengths, radar reflections, and infrared emissions. Visually speaking, we do not need to touch an aircraft to recognize it for such. We can hear it but it take training and experience to recognize one jet engine design from another. That take too much 'imagination'. That leave us with radar, visible, and IR wavelengths to work with.
In some situations, we can rule out the pilot's ability to exploit the visible wavelengths spectrum. What if his head is turned away from the threat? What if he is incapacitated in some way? Or even simpler, what if the object is too far for the human eyes? Then we move onto radar detection but that has its limitations as well and they are beyond the scope of this discussion. So now we have only IR wavelengths to work with.
On a complex body, we speak of it being complex in terms of materials and physicality or attributes....
Physicality | Define Physicality at Dictionary.com
A clothed human being has a different physicality or different attributes than a nude one. A nude human being does not have a 'collar' or a 'button'. A clothed human being does. Each element, from different surface dimensions to different materials, has a different IR emissivity level. Since we humans do not 'see' in the microwave wavelengths, what IR imaging does is to sense to as fine grain as possible those different emission levels from those different attributes, and convert them into a visible wavelengths representation of the complex body. The results are those highly colorful red/blue/yellow/purple/whatever images that we commonly see. The key here is the ability to discern emission levels when those levels are so close to each other, not just in emissivity but also in physical proximity.
With this ability, the system can ignore IR sources that does not match certain complete programmed objects, similar in how we can dismiss Joe Schmoe from Brad Pitt when they are grouped together and if we wanted to focus on Brad Pitt. Same thing if we wanted to focus on Sarkozy so we dismiss Pitt if both men are grouped together. If a flare is discharged, the missile will not be distracted because it has a very fine grained programmed representation of an 'F-15' so it knows that the IR emission that suddenly appeared is 'not F-15'. If the system need to 'imagine' or 'guess' between competing IR sources as to what they are, the greater the odds of distraction and a miss. That is why an 'imaging' ability is needed.