Comments to author (Associate Editor) ===================================== Dear authors, thanks for your submission. Please address the minor constructive revisions proposed by two independent reviewers. ---------------------------------------- Comments on Video Attachment: [None found] Reviewer 3 of IROS 2022 submission 623 Comments to the author ====================== The paper presents an interesting work on a challenging real-life scenario for a robotic manipulator. In the focus of this paper, the authors mainly address the manipulation of single or multiple layers of clothing using the tactile information collected from the ReSkin sensors and a Knn classifier. The scenario seems realistic, and the experiments well conducted. The work in overall is of good quality, is well written and the thesis is well explained. I will recommend for acceptance. Comments on the Video Attachment ================================ The video attachment is clear and illustrate well the experimental procedure used. The quality of the video is good, and the captions are clear. Reviewer 4 of IROS 2022 submission 623 Comments to the author ====================== The paper introduces the use of a relatively new tactile sensor based on an array of magnetometers interacting with a conformable magnetic skin for robotic dexterous manipulation of textiles. The challenge was to determine the number of layers of textile grasped from a stack. This was achieved using a simple k-means classifier from a limited training set of data captured during real-world interaction. Performance of the new sensor was compared against random, open loop and vision based classifiers, of which it performed better. The paper is nicely positioned against other works with the novelty and contribution clearly presented. The data collection and experimental procedures are clearly described with informative figures and concise captions. The results are positive, statistically significant, and are a valuable contribution to the community. Minor criticisms to improve the paper: There is mention of 350 x 15-D measurements taken over the 5 second interaction episode, which presumably implies that the grippers are closed for ~1 second (400Hz sample rate). Does this mean that the 1 second or so of tactile data is presented as a single input example during training and subsequently during validation? This seems to contradict what is implied in section IV.B that point samples rather than time series are used here. I would read this as time series data and therefore somewhat richer in information that a static image, for example. Clarity in such detail would make for a stronger paper. Similarly, the vision baseline approach is not clearly described. ResNet is referred to but no mention of data set capture, image quality, perspective view, etc. A reference to methods used elsewhere in this regards should have been cited. Comments on the Video Attachment ================================ Very nice video that supports the paper