Traditionally organizations have tried to understand consumer attitudes and behaviors using market research data and transaction data.
However, the attitudes and behaviors that are measured are in some way representative of past emotions. This in no way can capture the sentiment and emotion at a particular moment, which can give insight to human behavior in the form of tangible actions at that moment. For example, when a consumer is walking the aisle of a superstore and scanning different products on shelf, it will be interesting to capture his mood and relate that to his product consideration and eventually buying behavior. Video analytics, along with the advancement of big data analytics, can play a crucial role in measuring emotions on a real time basis.
BUSINESS APPLICATION
There can be many applications of the above mentioned approach, whether it is in a retail store or with respect to security.
For example, in case of a retail store, mood recognition technique will help ascertain consumer experience in retail outlets and stores, which is a plausible compliment to existing consumer loyalty programs. In good old days, each locality had a store run by individuals who offered more consumer-centric shopping experience compared to any existing programs now. This was possible only because the number of consumers visiting the store was less and hence the shopkeeper was familiar with everyone, and could communicate individually and provide customized offers at individual level. With the advent of video analytics and advancement of big data technology, it is now possible to measure consumer mood at a particular moment and can be used for various scenarios.
One can measure overall mood of consumers (net sentiment or net ‘mood') in a store and relate it to store sales (on a daily, weekly or monthly basis). Video analytics can also be used to understand the reaction and mood of people when presented with new products on the shelf or new promotions/offers being highlighted inside the store. The ultimate application of this will be to relate consumer mood and emotions to actual purchasing behavior at a consumer level.
The entire process of video analytics has 4 components.
FACE DETECTION
This is the first step for video analytics. Given an image can you recognize if it is a human face or is it something else? Our analytics application detects faces from a live video feed. The faces are detected from a video feed which may include noise in background images, non-human faces, motion blur etc. Hence, we have used varied techniques to remove these noise elements and detect the face or faces to be saved (in the repository) and made available for analysis. Against our intuition, this was a tougher problem than the others in hand. It is also necessary to distinguish individual faces amongst multiple faces within a frame of a video.
FACE DISCRIMINATION
Next we take steps towards face discrimination. The important problem is to discriminate among the faces detected in the video feed, from one frame to another, to avoid storing multiple images of the same person, to pick up the optimal picture with more details, etc, and thus enhancing the performance in the face recognition stage.
FACE RECOGNITION
The next step is to recognize a facial image from a repository of saved faces. A facial image is a point from a high-dimensional image space to a lower-dimensional representation, which makes classification easy. A face image is eventually treated as a data set (a 3 dimensional matrix), and each image is expressed in terms of a vector which characterizes the image with respect to the other images. For comparing the image with the database or to characterize the image, it evidently requires all the images to be of same size, same layout, ie, face being located in same portion of the entire image, etc.
MOOD DETECTION
Mood can be defined and detected based on the change and rate of change (over a series of frames) in different data vector elements that characterizes an image. Therefore in order to detect mood, one has to analyze a series of detected face across series of frames as captured by video technology.